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The points-to problem is the problem of determining the possible run-time targets of pointer 
variables and is usually considered part of the more general aliasing problem, which consists 
in establishing whether and when different expressions can refer to the same memory address. 
Aliasing information is essential to every tool that needs to reason about the semantics of programs. 
However, due to well-known undecidability results, for all interesting languages that admit aliasing, 
the exact solution of nontrivial aliasing problems is not generally computable. This work focuses 
on approximated solutions to this problem by presenting a store-based, flow-sensitive points-to 
analysis, for applications in the field of automated software verification. In contrast to software 
testing procedures, which heuristically check the program against a finite set of executions, the 
methods considered in this work are static analyses, where the computed results are valid for all 
the possible executions of the analyzed program. We present a simplified programming language 
and its execution model; then an approximated execution model is developed using the ideas 
of abstract interpretation theory. Finally, the soundness of the approximation is formally proved. 
The aim of developing a realistic points-to analysis is pursued by presenting some extensions to the 
initial simplified model and discussing the correctness of their formulation. This work contains 
original contributions to the issue of points-to analysis, as it provides a formulation of a filter 
operation on the points-to abstract domain and a formal proof of the soundness of the defined 
abstract operations: these, as far as we now, are lacking from the previous literature. 

Categories and Subject Descriptors: F3.1 [Logics and Meanings of Programs]: Specifying 
and Verifying and Reasoning about Programs. 

General Terms: Languages, Static Analysis. 
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1. INTRODUCTION 
1.1 The Aliasing Problem 

In imperative programming languages the concept of memory location is of main 
importance; it refers to an entity able to keep a finite quantity of information 
across the subsequent steps of the computation. The concept of variable is then 
developed as a way to refer to memory locations. In the different languages, different 
constructs allow for the composition of variable names so as to form expressions 
(Listing 1). From the use of these constructs comes the possibility to refer to the 
same memory location with different expressions. In the literature, two expressions 
referring to the same memory location arc said to bo aliases; the set of pairs of alias 
expressions is commonly referred to as alias information and the aliasing problem 
is known as the problem of analyzing the alias information of a program. Due to 
the many mechanism that can lead to the generation of aliases, the aliasing problem 
is complex even to characterize. The following paragraphs show how the different 
constructs of the C language can affect the alias information. 
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1 


Struct S { 




2 


struct S *1 , 


*r ; 


3 


int key ; 




4 


} a[10] ; 




5 


int i ; 




6 






7 


a[i].l->key = 





Listing 1: different constructs of the C language can be used to compose variables 
into expressions. Note at line 7 the use of the dereference operator, of the index 
and field selectors in the same expression. Many are the available constructs and 
complex is the problem of analysing all their possible interactions. 



1 int a [10] , i , j ; 

2 ... 

3 if (i == j) { 

4 ... 

5 a [i] = a [ j ] ; 

6 ... 

7 } 

Listing 2: this example shows how the use of arrays may produce aliasing. At line 5 
the variables 'i' and 'j' hold the same value; then the expressions 'a[i] ' and 'a[j] ' 
denote the same memory location, i.e., they are aliases. 



1 int a[10] , *p , *q, dist ; 

2 ... 

3 dist = q - p ; 

4 ... 

Listing 3: the value assigned to the variable 'dist' at line 3 depends on the distance 
between the elements referred to by the pointers 'p' and 'q'. 



1.1.1 Aliasing From the Use of Arrays. The example presented in Listing 2 
shows how, through the use of the array's indexing mechanism, the aliasing problem 
is influenced by the value of integer variables. As shown by Listing 3, also the 
converse holds — the value of pointer variables, typically considered a alias- related 
issue, can influence the value of integer variables. 

1.1.2 Aliasing From the Use of Pointers. The simple example in Listing 4 shows 
how the use of pointers can produce aliasing. In the C language the support of 
pointers is particularly flexible and powerful. For instance, multiple levels of in- 
directions are allowed (Listing 5). These characteristics make the development of 
alias analyses for the C language a challenging problem. The study of the ahas- 
ing problem requires also to cover recursive data structures; the use of these can 
produce particularly complex alias relations (Listing 6). 
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1 int a , *p ; 

2 p = &a ; 

3 ... 

Listing 4: after the execution of line 2, the pointer variable 'p' contains the address 
of the variable 'a'; then the expressions '*p' and 'a' are aliases. 



1 int a , *p , 




2 p = &a ; 




3 pp = ftp; 




4 ... 





Listing 5: at line 2 the address of 'a' is assigned to 'p'; as a consequence, '*p' and 
'a' become aliases. At Hne 3 the address of 'p' is assigned to 'pp'; as a consequence, 
'*pp' and 'p' become aliases. Hence, also the expressions ' + +pp' and '*p' are aliases. 
Finally, by applying the transitive property, it is possible to conclude that ' + +pp' 
and 'a' are aliases too. 



1 struct List { 

2 struct List *next ; 

3 int key ; 

4 }; 

5 ... 

6 struct List head; 

7 head. next = fthead ; 

Listing 6: this example shows how recursive data structures can affect the aliasing 
problem. After the assignment at line 7, the expressions 'head.next->key', 
'head. next ->next->key' — and more generally each expression of the form 
'head. (next->)"key' with n G N — are all aliases of 'head. key'. Even a simple 
example can produce an infinite set of alias pairs. 



1.1.3 Aliasing Suhprohlems. Due to the many aspects that must be taken into 
account in order to provide a complete coverage of the aliasing problem, different 
area of research have been developed; as a result, in the literature a wide range 
of analyses is available, which encompasses all the alias subproblems — while a 
pointer analysis attempts to determine the possible run-time values of pointer vari- 
ables, a shape analysis focuses on the precise approximation of the aliasing relations 
produced by recursive data structures; whereas a numerical analysis is required to 
track the value of array's indices. 

1.2 A Static Analysis 

The goal of this work is to present an automated method able to prove certain 
alias properties of programs given in input. In the following we use the term alias 
analysis to refer to the general and theoretical ideas to approach the alias problem; 
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whereas we use the term ahas analyzer to stress the focus on the implementation of 
an automated analysis. We are interested in defining a static analysis. Commonly, 
in the context of software analysis, the adjective static referred to the term analysis 
designates a class of methods that avoid the actual execution of the examined 
program. In other words, a static analysis can be described as the process of 
extracting semantic information about a program at compile time. Static analysis 
techniques are necessary to any software tool that requires compile-time information 
about the semantics of programs. Consider indeed the following points. 

— The termination problem is undecidable; as a consequence any method that re- 
quires the execution of the program is not guaranteed to terminate. 

— If the execution of the program is performed then the computational complexity 
of the analysis is bounded from below by the computational complexity of the 
analyzed program. 

— Testing a program on some executions can prove the presence of errors; however, 
unless all of the possible executions are tried, it cannot prove the absence of er- 
rors. More generally, since a program can have an unbounded number of distinct 
executions, testing can only prove that a property holds on some executions, but 
it cannot prove that it holds always. 

Hence, the existence of analysis methods that avoid the actual execution of the 
program is motivated by the presence of constraints on the costs of the analysis, 
the need of predictability of these or the need to verify a property against all 
of the possible executions. Usually, the results of an alias analysis are only an 
intermediate step of the computation of a complete static analysis tool; this means 
that an alias analysis is commonly intended to answer to questions formulated by 
other automatic analyses. For instance, compilers are the most common tools that 
exploit the alias information — almost all of the modern compilers include some 
kind of alias analysis. From the practical perspective, the kind of queries that are 
posed to the alias analyzer is greatly influenced by the final application; whereas 
from the theoretical point of view it is useful to assume that the questions posed 
to the alias analysis are always of the form: does the property P hold on all/some 
executions of the program? 

1.2.1 One Program, Many Executions. Generally, the flow of the execution de- 
pends not only on the program's source code but also on external sources of infor- 
mation, e.g., the user's input or a random number generator; when many executions 
paths are possible, a property may hold on some but not on all the possible execu- 
tions (Listing 7). In the following we refer to a function declared as 'int randO' 
as a source of non-determinism; we assume that this function always halts, that it 
can return zero and not-zero values and that it has no side-effects on the caller. 

1.2.2 The Aliasing Problem Is Undecidable. The problem of determining the 
alias properties of a program is undecidable; it is indeed possible to reduce a problem 
that is well known to be undecidable, the halting problem, to the aliasing problem. 
In the sequel, we refer to a function declared as 'int turing(int n)'; we assume 
that (1) this function is defined somewhere in the source code and it emulates the 
execution on the input n of some Turing machine; (2) the result of the execution 
of the emulated Turing machine is returned to the caller as the return value of 
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1 int a , b , *p ; 

2 p = &a ; 

3 if (randO) 

4 p = ftb ; 

5 ... 

Listing 7: at line 2 the address of 'a' is assigned to the variable 'p'; then at line 3, 
during all executions, '*p' is an alias of 'a'. At line 4 the address of 'b' is assigned 
to 'p', but this statement is executed only when at line 3 the call to rand() returns 
a non-zero value. Therefore, it is possible to prove that there exists at least one 
execution path that reaches Hue 5 in a state where '*p' is an ahas of 'a' and also 
there exists at least one execution that reaches line 5 in a state where the same 
property is false. 



the function; (3) calling this function has no side effects on the caller environment. 
Listing 8 highlights how the aliasing problem is influenced by the halting problem. 
For this reason the ahasing problem is formulated assuming the reachability as 
hypothesis. This assumption is not always valid but it is safe, or conservative. In 
Listing 8 it is not possible to tell if line 5 will ever be reached; however, in that case 
what would happen?^ More generally the question is — if the execution reaches the 
program point p does the property P hold at p? The results of the analysis are then 
expressed as an implication of the kind — if p is reached then P holds. However, 
even in this weaker form, the aliasing problem is still undecidable. Consider for 
instance Listing 9, where line 7 is reached if and only if the call 'turing(K)' at 
line 3 halts; in this case the value of 'p' is determined by the return value of 
'turing(K)'. As a consequence of Rice's theorem [HMRUOO], also assuming that 
'turing(K) ' halts, there exist no algorithms able to tell for every 'K' if the execution 
reaches Hne 7 in a state where 'p' points to 'a'. 

1.2.3 Summing Up. This section summarizes the various possibilities just pre- 
sented. Let P be an ahas property and its negation. There exist four possible 
cases. 

(1) The property P holds on all of the possible executions or equivalently, -iP never 
holds (Listing 7). 

(2) The property P holds on some but not on all of the possible executions; that 
is, there exists at least one execution in which P holds and also there exists at 
least one execution in which -iP holds (Listing 7). 



^The idea and the motivations behind this approach are similar to those that drive the development 
of Hoare's logic for partial correctness specification, opposed to the total correctness specification, 
both introduced in [Hoa03]. The concept of Hoare's triple for partial correctness is introduced — 
it is a triple {P} C {Q} where C is a command of a given programming language and P and Q are 
two propositions expressed in some fixed first order logic language. Informally, in Hoare's logic 
the triple {P} C {Q} is said to be true if whenever C is executed in a state satisfying P and the 
execution of C terminates then the resulting output state satisfies Q. 
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1 int K = . . . ; 

2 int *p; 

3 p = 0; 

4 turing (K) ; 
6 *p = 1; 

Listing 8: at line 4 the call to the function 'turing' starts the computation of the 
Turing machine. Suppose that the call halts; in this case the execution reaches 
line 5 causing an error due to a dereferenced null pointer. However, the problem 
of telling whether the execution of a Turing machine will ever halt is undecidable 
— there exists no algorithm able to tell for each possible value of 'K' if line 5 will 
ever be reached by the execution; thus if there exists an execution path where a 
null pointer is dereferenced. 



1 


int K = . . 


• > 


2 


int *p , a , 


b; 


3 


if (turing 


;(K)) 


4 


p = &a ; 




S 


else 




6 


p = &b ; 




7 







Listing 9: an example of the possible interactions of the aliasing problem and other 
undecidable problems. There exist no algorithm able to tell for every 'K' if there 
exist an execution that reaches line 7 in a state such that 'p' points to 'a'. 



(3) The property P holds on some executions but it is not known if it holds always; 
that is there exists at least one execution in which P holds but it is unknown 
whether there exists an execution in which -iP holds (Listing 10). 

(4) It is not known if there exists an execution in which P holds and also it is 
unknown whether there exists an execution in which -iP holds (Listing 9). 

For instance, suppose that P expresses the absence of some kind of error. The first 
of the listed cases is the optimal case: it has been proved that no errors are possible. 
The second case is as much positive: it has been proved that there exists at least one 
erroneous execution, that is the program contains a bug. In the third and the fourth 
case it is unknown, i.e., the absence of errors cannot be proved. However, assuming 
the reachability as hypothesis, alias analyses cannot prove the result described in 
the second case. In other words, every static analysis that assumes the reachability 
as hypothesis can only prove that P holds always. In this sense, testing procedures 
are complementary to static analyses techniques. 

1.3 Applications 

The ahas information is required by many static analyses; this is due to the following 
fact: analyzing an indirect assignment, and generally an indirect memory reference, 
without knowing alias information requires to assume that the assignment may 
modify almost anything and, under these hypotheses, it is unlikely that the client 
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1 int K = . . . ; 

2 int *p , a ; 

3 p = &a ; 

4 if ( rand () ) 

5 if (turing(K)) 

6 p = 0; 

7 *p = 1; 

Listing 10: line 5 is reached only when the return value of the call to 'randO' 
evaluates to true. Thus, line 6 is reached only when the execution reaches line 5 
and the call 'turing(K)' halts and the return value evaluates true. Certainly there 
exists executions that reach line 7 in a state where 'p' points to 'a'. However, also 
assuming that 'turing(K)' halts, there exists no algorithm able to tell for every K 
if there exist an execution path that reaches line 7 with 'p' equal to null. 



1 void f ( int *p) { 

2 ... 

3 *p = 0; 

4 ... 

5 } 

Listing 11: analyzing this fragment of code without any aliasing information would 
require a worst-case assumption about the locations pointed by 'p', that is, all the 
possible targets of an 'int*' can be modified by the assignment at Hue 3. 

analysis will be able to deduce any useful result (Listing 11). For what concerns 
the final application, there are two main areas where the aliasing information is 
commonly used. 

— Optimization and parallelization; used in compilers and interpreters. 
— Programs semantics understanding and verification; used in debugging/ verifier 
tools. 

These two uses have vastly different requirements on alias analyses. For compiler 
oriented applications there exist some upper bound on how much precision is useful. 
There are various studies [HPOO; HPOl] that state that this upper bound is reached 
by the current state of the art. For the use in program understanding/verification 
the picture is different; in this case there is instead a lower bound on precision, 
below which, alias information is pretty useless. It is commonly believed that the 
spectrum of techniques currently available does not fully covers the requirements 
of this kind of use: more research work is necessary. 

1.3.1 Client Analyses. This section presents a brief list of the most common 
static analyses that require the ahasing information. 

Mod/Ref analysis. This analysis determines what variables may be modified/ref- 
erenced^ at each program point. This information is subsequently used by other 



^Here the term 'referenced' means that the value of the object is read. 
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analyses, such as reaching definitions and live variable analysis. Each dereference in 
the program generates a query of the aUas information to determine the referenced 
objects that are thus classified as modified or referenced depending on the context 
in which the dereference operator occurs. For example, in assignment statements, 
the objects referred by the last dereference of the Ihs are marked as modified, all 
other objects referred in the evaluation of the rhs and the Ihs are instead marked 
as read. 

Live variable analysis. It is common to many imperative languages that the life 
of a local variable starts at the point of definition and ends at the end of the 
scope that contains the definition. At the extent of minimizing the memory usage 
of the compiled program, while keeping unchanged its semantics, it is possible to 
defer the creation to the point where the variable is first assigned and anticipate 
its destruction to the last point where its value is used. The live variable analysis 
tries to compute this information that is useful to compilers for register allocation, 
detecting the use uninitialized variables and finding dead assignments. 

Reaching definitions analysis. This analysis determines what variables may reach 
(in an execution sense) a program point. This informations is useful in computing 
data dependence among statements, which is an important step for the process of 
code-motion and parallelization. 

Interprocedural constant propagation. This analysis tracks the value of constants 
all over the program and uses this information to statically evaluate conditionals 
with the goal of detecting if a branch is unreachable; thus allowing the detection of 
unreachable code. 

1.4 Background 

Probably due to the difi^erent areas of application, historically this field of research 
has treated as separate two fundamental aliasing-reXaXeA problems: the may alias 
and the must alias problem. If the general interest of aliasing-related static analyses 
is the study of how different expressions lead to the same memory location, these 
two speciaHzations can be characterized as follows. 

May alias. It tries to find the aliases that occur during some execution of the 
program. 

Must alias. Find the aliases that occur on all the executions of the program. 

Results exist that confirm that the former problem is not recursive^ while the latter 
is not recursively enumerable'^ [Lan92]. In recent developments the same concepts 
arc also expressed in terms of possible and definite alias properties. The term 
definite alias property is used to designate an alias property that holds on every 
possible execution; whereas a possible alias property P is such that both P and 
^P cannot be proved to be definite. Unfortunately, the mismatch between the 
naming and the notation used in the published works is not limited to the case just 



problem P is said to be recursive, or decidable, if tliere exists an algorithm that terminates 
after a finite amount of time and correctly decides whether or not a given input belongs to the set 
of the solutions of P. 

recursively enumerable problem P is a problem for which there exist an algorithm A that halts 
on a given input n if and only if n is a solution of P. 
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Listing 12: a program that exposes a simple alias relation. 



described. For instance, in the literature the names pointer analysis, alias analysis 
and points-to analysis are often uses interchangeably. As suggested by [HinOl], we 
prefer to consider the points-to analyses as a proper subset of the alias analyses. 
An alias analysis attempts to determine when two expressions refer to the same 
memory location; whereas a points-to analysis [And94; EGH94; HBCC99] is focused 
in determining what memory locations a pointer can point to. Points-to methods 
are also characterized by the same representation of the aliasing information. As 
described in [HinOl], the representation of the alias information is only one of the 
several parameters that can be used to categorize alias analyses. 

Representation. For the representation of alias information various options are 
possible. 

Complete alias pairs. With this representation all the alias pairs produced by the 
analysis are stored explicitly. 

Compact alias pairs. Only a subset of alias pairs is kept explicitly. The com- 
plete relation can be derived applying the dereference operator, the transitivity 
and symmetry properties to the pairs expHcitly stored. 

Points-to pairs. This representation tracks only the relations between the point- 
ers and the pointed objects. The complete alias relation can be derived from the 
points-to information in a way similar to what done for the compact alias pair 
representation. This process is informally described in [Ema93]. 

For instance, the alias relation generated by the sequence of assignments in List- 
ing 12 can be represented using the points-to form as 

{(p, i), (q,p), (r, i)}. 

This corresponds to the complete alias pair set^ 

{(*p, i), (+q,p), (**q, i), (*r, i), (r,p), (*r, *p), (r, *q), (*r, **q)}. 

Note that these representations — complete, compact and points-to — are listed in 
order of decreasing expressive power — the rules of deduction used to infer the 
complete ahas relation from the compact and the points-to format impose a precise 
structure on the relation. In the next we presents some examples to show how the 
points-to representation can be less precise than the alias representation (Section 4). 
On the other hand these deduction rules allow to reduce the set of pairs that have 
to be explicitly represented thus decreasing the cost of the analysis. Note also 
that, due to recursive data structures, the complete alias relation may contain an 
infinite number of pairs. If one of the possibilities to overcome this problem is 



^In this case we have omitted to explicitly write the alias pairs that can be obtained by symmet- 
rically closing this relation. 
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to adopt a compact or a points-to representation, other solutions, specialized in 
the handling of recursive data structures, exist. These methods use quite different 
formalism from the ones presented here and they have generated a quite independent 
field of research that is named shape-analysis. An example of these alternative 
representations is briefly described in Section 2.5.2. 

Flow-sensitivity. The question is whether the control-flow information of the 
program is used by the analysis. By not considering control-flow information — 
therefore computing only a conservative summary of it — Row- insensitive analyses 
compute one solution for either the whole program or for each function [And94; 
Ste96; HBCC99], whereas a Row- sensitive analysis computes a solution for each 
program point [EGH94; HBCC99]. Therefore, flow-insensitive methods are gener- 
ally more efficient but less precise than flow-sensitive ones. 

Context-sensitivity. The point is if there is a distinction between the different 
callers of a function, that is if the caller-context information is used when analyzing 
a function. If this is not the case, the information can flow from one cah site (say 
caller A) through the called function (the callee) and then back to a different call site 
(say caller B) thus generating a spurious data flow in the computed solution on the 
code of the caller B. Whenever a static analysis combines information that reaches 
a particular program point via different paths some accuracy may be lost. An 
analysis is context-sensitive to the extent that it separates information originating 
from different paths of execution. Because programs generally have an unbounded 
number of potential paths, a static analysis must combine information from different 
paths — in this sense, the context sensitivity is not a dichotomy but rather a matter 
of degree. 

Heap modeling. The analysis of heap-allocated objects requires different strate- 
gies from that of stack-allocated and global memory objects. First because heap 
objects have a different life-cycle with respect to automatic and globals variables; 
second, the term heap modelling, is commonly but improperly used to refer to the 
modelhng of recursive data structures as these are usually allocated on the heap. 
Various trade-offs between the precision and the efficiency exist also for this prob- 
lem. 

— The simpler solution consists in creating a single abstract memory location to 

model the whole heap [EGH94]. 
— Another solution distinguishes between heap allocated objects on the basis of 

the program point in which they are created, that is objects are named by the 

creating statement (context-insensitive naming.) 
— A more precise solution names the objects not only by the program point of 

the creating statement but with the whole call path (context-sensitive naming.) 

For example, this means that if the program contains a user deflned function for 

memory allocations (e.g., a wrapper of the 'malloc' function) then the analysis 

is able to discern objects created by different calls of the ahocation routine. 
— Shape analysis methods adopts a quite different approach to the problem of 

naming locations, which is based on the expression used to refer to the memory 

location. 

Whole program. Does the analysis method require the whole program or can 
a sound solution be obtained by analyzing only its components? In the current 



Definition and Implementation of a Points- To Analysis " 13 

panorama of software development, component programming and the use of libraries 
are becoming more and more popular. This trend requires the capability to analyz- 
ing fragments of code as the whole program may not be availablo [LLV]. 

Language type model. In strongly typed languages, the type information — that 
can be easily extracted from the source code using common compiler techniques — 
can be used by the alias analysis to deduce affordable informations about the layout 
of pointers. This information, joined with other assumptions on the memory model 
that usually accompany this kind of languages, can greatly simplify the formulation 
of the alias analysis. However, as noted in [WL95], a pointer analysis algorithm can- 
not safely rely on high-level type information for C programs. Because of arbitrary 
type casts and union types, the defined types can always be overridden. This means 
that type information cannot be used to determine which memory locations may 
contain pointers. To be safe, an analysis must assume that any memory location 
could potentially contain a pointer to any other location. Similarly, any assignment 
could modify pointers, even if it is defined to operate on non-pointer types. 

Aggregate modeling. This point regards how aggregate types are treated: the 
main question is whether the subelements are distinguished or collapsed into one 
object. The choice of the analyzed language is of main relevance: this task results 
particularly complex to address in weakly-typed languages such as C/C++; in these 
languages the same memory area can be read using different types. An analysis that 
aims to precisely track pointers to fields must then consider the possible overlapping 
between the memory layouts of the different types. In strongh^ typed languages like 
Java this difficulty does not exist, as these languages do not allow for reading the 
memory with a type different from that used for the allocation. 

1.5 The State of the Art 

Static analysis originally concentrated on Fortran and it was predominately con- 
fined to a single procedure {intra-procedural analysis). Since the emergence of 
the C language, static analysis of programs with dynamic storage and recursive 
data structures has become a field of active research producing methods of ever 
increasing sophistication. In [HinOl] it is noted that, during the past two decades, 
over seventy-five papers and nine Ph.D. theses have been published on alias anal- 
ysis, leading the author to the question — given the tomes of work on this topic, 
haven't we solved this problem yet? The answer is that though many interesting 
results have been obtained, still many "open questions" remain. As shown in the 
introduction, also limited to the analysis of pointers, the aliasing problem is still 
undecidable [Lan92l; therefore, the main question that arise approaching it is about 
the desired trade-off between the efficiency of the algorithm and the precision of the 
approximated solution computed. A wide range of worst-case time complexities is 
available: from almost linear [Ste96] to exponential [Deu94]. The current research 
effort is proceeding in at least two distinct directions: improving the efficiency of 
the analyses while keeping the actual precision and increasing the precision of the 
approximation while keeping a reasonable computational costs. 

1.5.1 Improving the Efficiency. Again in [HinOl], the problem of scalability is 
listed among the "open questions". About this topic two distinct efforts are currently 
active and both proceed toward the goal analysing programs of ever increasing 
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size. Today, flow-insensitive analyses [Ste96; LLV] can quickly analyze million- 
line programs. It is commonly believed that the precision provided by these fast 
methods is sufficient to satisfy ordinary compiler-oriented client analyses [HinOl]; 
but definitely they do not suffice for verifier-oriented applications [OR06; WMD08]. 
On the other side various works [HBCC99] have increased the efficiency of the more 
precise but slower flow-sensitive methods with respect to the initially proposed 
methods [EGH94]. It must be noted that some studies [EGH94; HPOO; HPOl] show 
that client analyses improved in efficiency as the pointer information was made more 
precise because the input size to the client analysis becomes smaller; on average, 
this reduction outweighed the initial cost of the pointer analysis. However, these 
studies focused on typical compiler oriented analyses — no data is available for the 
fleld of program understanding/veriflcation. 

1.5.2 Improving Precision. Another goal of the current research effort is to im- 
prove the precision without sacrificing the scalabflity. As for the scalability issue, 
nowadays there are two main directions in which researchers are investigating to 
improve the current state of the art. The first area of investigation tries to recon- 
sider the notion of safety by loosening the soundness constraints on the analysis. 
The other direction of investigation tries to recognize the areas of the source code 
that needs to be analyzed with greater accuracy; the idea is to perform a quick ahas 
analysis on the whole program and then refine the first results only in those regions 
of the code where more precision is needed. In other fields of the static analysis 
research this idea has yield to the formalization of the concept of demand- drive.n 
analysis [OR06; WMD08]. Demand-driven methods can avoid the costly computa- 
tion of exhaustive solutions: given an initial query, the analysis contains the logic 
to detect what other information are needed to answer it and then it proceeds by 
recursively formulating a new set of queries. It is still an open question whether the 
precise alias analyses currently available -that is flow- and context-sensitive analy- 
ses and shape analyses- can be reformulated in a demand-driven fashion [HinOl]. 

1.5.3 Different Notions of Safety. A reading of the literature available for the 
fleld reveals that there exist two slightly different notions of safety, which are de- 
termined by the different areas of application. Compiler targeted analyses are re- 
quired to produce a safe approximation of the alias information for every standard- 
compliant program, allowing thereby the analyzer to assume that the analyzed 
program is standard-compliant.^ From [WL95] 

The possibility of non-pointer values [stored inside pointer variables] is 

not always important. For example, when a location is dereferenced, we 
can assume that it always contains a pointer value, since otherwise the 
program would be erroneous. 

On the other hand, for software veriflcation tools, the conformance of the analyzed 
program to the standard is not an hypothesis but one of the theses that need to be 
proved. For example, a desirable feature for a verifler tool would be to signal if a 
dereferenced pointer may hold an undeflned or a null value. For analyses that cannot 



For some notion of standard-compliant; there exists different possible language standards, hence 
different notions of standard-compliance. 
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simply ignore the possibility of errors, the approach called 0-soundness is usually 
applied [CDNB08]: when the analysis detects the possibility of an error, then the 
program point is marked with a warning and the analysis proceeds assuming that 
the condition that led to the error is not verified. For example, if we have that 
to the pointer 'p' corresponds the points-to set {a, null} — i.e, 'p' may point to 
the variable 'a' or be null — then the analysis of the statement '*p' would produce 
a warning for a possible dereferenced null pointer and the execution will continue 
assuming that 'p' points only to 'a'. Verifier targeted analyses are not allowed 
to assume the absence of errors; in this sense, the notion of safety required by 
compiler targeted analyses is weaker. However, practical considerations softens the 
requirements on verifier's analyses. If compilers are required to expose a well-defined 
behaviour on all conforming programs, verification tools often assume stricter rules 
than those dictated by the standard of the programming language with the result 
of restricting the class of analyzable programs to a set of well-behaved ones. For 
instance, assuming the absence of some kind of casts [Act06], it is possible to 
simplify the analysis and also improve its precision. For those programs that do 
not belong to this restricted set, the analysis produce some false positives'^ and the 
process of 0-soundness will erroneously remove from the abstraction some of the 
possible executions yielding to a non-safe result. As noted in [HinOl] this can be 
acceptable in many areas: 

I was told the users actually liked the false-positives in my analysis be- 
cause they claimed when my analysis got confused it was a good indica- 
tion that the code was poorly written and likely to have other problems. 
This came as a complete surprise. While additional study is needed to 
claim these observations to be valid in a broader sense, they lead me 
to conclude that the notion of safety should be reconsidered for many 
applications of static analysis. 

1.5.4 Measuring the Alias Analyses. It is a quite accepted fact that in the alias 
analysis field, the independent verification of the published results is a considerably 
difficult task. The first consequence of this is the absence of a clear and com- 
plete comparison between the existing methods. The difficulty of reproducing the 
publicly available results can be explained by the intrinsic difficulty of defining a 
valuable metric for the problem as a great number of parameters must be taken 
into account: as the chosen intermediate representation, the benchmark suite used 
for the testing phase and, more generally, all the details of the infrastructure where 
the analysis is put to work. For instance, some analyses [EGH94; HPOO] work on 
an intermediate representation of the code that results from a simplification phase, 
which reduces all expressions to a normal form with the goal of limiting the com- 
plexity of the implementation as less cases need to be considered; however, it also 
introduces temporary variables and intermediate assignments to emulate step by 
step the evaluation of the original expressions. Since many of the used metrics 
depend on the number of variables, this transformation makes harder, if not im- 
possible at all, any comparison between these methods with other methods that do 



false positive is an error reported by the analyzer which however cannot occur in any of the 
possible execution paths. 
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not perform the simplification. Moreover, alias information is not useful on its own, 
but it is needed by other client analyses. Thus, the definition of what is a good 
trade-off between the cost of the analysis and the precision of the computed solu- 
tion inevitably depends on the client appHcations; it is indeed a common opinion 
among the researchers that each area of appHcation requires an ad-hoc method or 
an adaptation of one described in the literature. The result is that a single metric 
that gives an absolute measure of the value of a method does not exist. However, 
to help implementors of aliasing analyses to determine which pointer analysis is 
appropriate for their appHcation and to help researchers to identify which algo- 
rithms should be used as basis for future advances, some partial metrics have been 
proposed [HPOl]; the idea is that since all these metrics have their strengths and 
weaknesses, a combination should be used. A first popular metric records for each 
pointer variable the number of pointed objects; the idea is that a lower number 
of referenced objects would mean a more precise alias information. Although this 
metric is quite simple to measure, it presents some fiaws. 

— Due to local variables in recursive functions and the possibility of dynamically 
allocating memory (heap-allocated objects), an alias analysis should be able to 
model an unbounded number of objects. To have a finite representation of the 
set of the possible memory objects, each method defines a finitely representable 
approximation. For example in [EGH94] the whole heap is modeled as a single 
object; in this case the metric will count only one for all the referenced heap- 
allocated objects with the effect of incorrectly suggesting a precise analysis. 

— As anticipated, alias information is used by other client analyses, then its real 
effectiveness can only be measured on the results of whole process. But there are 
no straightforward relations between the results of this metric and the precision 
of the client analyses; For example, the removal of a single alias pair would allow 
for the client analysis to prove the absence of a run-time error otherwise not 
provable. 

The above metric is usually named direct as it refers to a quantity that is a direct 
result of the analysis. To address the fiaws just highfighted, some indirect metrics 
have been developed. 

(1) A first kind of indirect metric measures the relative improvement to the preci- 
sion of the aliasing information with respect to the worst-case assumption. This 
kind of metric is reported to be particularly useful on strongly-typed languages 
where the worst-case assumptions are not as bad as in other weakly-typed lan- 
guages like C [HinOl]. 

(2) A second kind of indirect metric requires to implement a client of the alias 
information and then it measures the variation of the precision of the results 
of the client analysis at the varying of the precision of the supplied aliasing 
information. The main weakness of this metric is that its results cannot be 
generalized to other client analyses. 
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Comparisons are difficult also for what concerns performances. The careful engineer- 
ing of a points to analysis, particularly for flow-sensitive analyses,^ can dramatically 
improve its performance [HinOl]. The worst-case complexities ofton do not reflect 
the mean cost of the algorithm, which is greatly influenced by heuristics developed 
over the default algorithm, which however require a great effort of flne tuning for the 
specific target application. However, as criticized in [HinOl], even today most pub- 
lished papers about new analysis methods seldom present a complete quantitative 
evaluation using these guidelines; also, for those works that provide experimental 
data, too often the independent verification is missing and the acceptance of the 
proposed results becomes a matter of faith. 

1.5.0 Notes on the Analysis of the Java Language. The Java language has 
emerged as a popular alternative to other mainstream languages languages in many 
areas. Java presents a clean and simple memory model where conceptually all ob- 
jects are allocated in a garbage-collected heap. While useful to the programmer, 
this model comes with a cost. In many cases it would be more efficient to allocate 
objects on the stack, eliminating the dynamic memory management overhead for 
that object. Aliasing analysis allows to detect those cases in which it is possible 
to perform this simplification. Another characteristic of the Java language is the 
availability of synchronized methods that ensure that the body of the function is 
executed atomically by acquiring and releasing a lock in the receiver object. But 
the lock overhead is wasted when only one thread can access the object; the lock is 
required only when there is multiple threads may attempt to access the same object 
simultaneously. Also in this case, alias analysis allows to detect which threads can 
access an object and thus possibly allowing the removal of the code for the locking. 
Studies have shown [WR99] that it is possible to eliminate a significant number of 
heap allocations (in the tests between 22% and 95%) and synchronization opera- 
tions (in the tests between 24% and 64%). For what concerns the reafization of 
alias analyses, the Java language — while adding new features like virtual functions 
and exception handling — may still be much easier to analyze than the C language 
[WL95] because of its strong type system:^ without type casts and pointer arith- 
metics, the type information given by the static type system of the language can be 
used to deduce affordable alias information. Another feature of Java simplifies the 
analysis algorithm: it does not support pointers into the middle of an object — an 
object reference in Java can point only to the beginning of an object. This means 
that two pointers may either point to exactly the same location or not; they cannot 
point to different offsets within one allocated block of memory as it is possible in 
the C language. 

1.6 Organization 

Starting from Section 2, the paper provides a general description of the instruments 
commonly used to approach the points-to and the alias problems. 

Starting from Section 3, a simplified language and a simplified execution model 
are introduced; the execution model comprehends the memory model and the op- 



This is probably due to the greater complexity of flow-sensitive analyses with respect to a flow- 
insensitive one. In a more complex method there are more opportunities to improve. 
®The same consideration holds for all other strongly typed languages. 
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erations that acts on it. Subsequently, an approximated memory model and the 
approximated operations are presented. Following the methodology of the abstract 
interpretation theory, the soundness of the approximated execution model is proved. 
Finally, some informal considerations about the precision of the abstraction are pre- 
sented. 

Starting from Section 5, in order to present a reahstic points-to analysis, some 
extensions to the model introduced in the previous sections are presented and a 
possible implementation of the approximated memory model is described. 

Finally, Section 6 draws the conclusions of the work and it discusses some of the 
possible future developments of the present work. 

1.7 Purpose of the Work 

The presented method is targeted for application in the context of software verifica- 
tion. Compiler-targeted appHcations require relatively imprecise ahas information, 
thus they can rely on fast algorithms for its computation. However, as empiri- 
cal studies have evidenced [HPOl; HinOl], for software verification there is a lower 
bound of precision below which the points-to information is pretty useless. For 
these reasons, our aim is to develop a points-to analysis that, though less efficient 
than other methods based on the same representation, computes a more precise ap- 
proximation of stack-allocated objects and that is also suitable for integration with 
the precise inter-procedural techniques already present in the literature [Ema93; 
WL95]. 

1.8 Contributions 

The present work describes a store-based, flow- sensitive and intra-procedural points- 
to analysis working on a relatively high-level intermediate representation of the 
source code, which also makes no assumptions about the intcr-proccdural analysis 
model. In particular, beyond the assignment operation — which is the most essential 
operation of a points-to analysis and thus it is omnipresent in all the papers on the 
topic — we describe a filter operation that enables the analysis to increase the 
precision of the computed solution by exploiting the expressions used in branching 
statements. Moreover, a formal proof of the soundness of the presented operations 
is developed. 

2. PRELIMINARIES 

This section presents informafiy the approach used for the definition of the points-to 
analysis. 

2.1 Notation 

Before proceeding, some clarifications about the used notation are necessary. Let 
A and B be two sets. We write B' to denote a total function from the set A 

to the set B; we write 'A ^ B' to denote a partial function from A to B. We use 

def 

= A B' to denote the set of all (total) functions from A to B; whereas we 
write '/: A ^ i?' to mean that / is a (total) function from A to B. 

We denote as 'Bool' the set {0, 1} and as 'Bool'' the set p({0, 1}); for convenience 
of notation we use '_L' to refer to the empty element and 'T' for the {0, 1} element. 
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We refer to the complete lattice associated to the set Bool" as the structure (Bool", C 

,u,n,{o,i},0). 

Let n,m € N, where n < m, we write '{n, ••• ,m}' to denote the set of the 
naturals from n to m, i.e., {i G N | n < i < m}. 

Let A,B,C be finite sets. We write '#^4' to mean the cardinality of the set 

A. Let f: A ^ p{B) and g: B ^ C and a e A he such that #/(a) = 1; for 
convenience of notation wc write 'g{f{a)y to mean g{b) where /(a) = {b}. 

2.2 The Execution Model and Its Operations 

Though our work is ideally targeted for the C language, we need to introduce some 
kind of formal execution model. The standard of the C language has indeed many 
implementation defined issues that every execution model is required to specify in 
order to provide a working environment for the execution of programs. The litera- 
ture provides several of such formahzations [BHZ08]; however, for this presentation 
many of the details would be useless. With the aim of keeping a simple notation, wc 
introduce the following concepts. We denote with 'Expr' the set of the expressions 
of the language. With execution model we mean a formally specified computing 
device able to execute programs written in the analyzed language. With memory 
description, or simply memory, we mean a description of the state of the execution 
model at some step of the computation. ^° Fixed the execution model, we denote 
with 'Mem' the set of the memory descriptions. Wc make few assumptions about 
the structure of the memory model; we assume that a memory is composed by a set 
of memory locations,^^ denoted as 'Loc'. Given a memory description m € Mem 
and a location I G Loc, we denote with 'm[Z]' the information that m stores at the 
location /. We also assume the existence of a partial evaluation function 

EVAL : Mem x Expr ^ Loc. 

In the real world, the execution of a program acts in different ways on the memory 
structure of the computing machine. With the aim of formalizing these interactions, 
we introduce the concept of operation; an operation is defined as a partial function 

OP : Mem x Ext ^ Mem 

where 'Ext' is an unspecified set that formalizes the use of external information. 

Note that wc have specified OP as a partial function — this is needed to model 
the fact that the possible actions that can be performed on the memory structure 
are not defined on all of the possible states. For example, to process the return 
statement of a function, the stack of the memory must contain at least one activation 
frame. By aiming to perform a static analysis, we are interested in determining all 
the possible memory descriptions that can be generated at a specified program 
point. To express the transition from a set of memory descriptions to another as a 
consequence of an operation, we extend the definition of the operation OP to sets. 



^"in the formalization of Turing macliines [HMRUOO]. an inslanlane.ous dear.riplion is a complete 
description of the computing device at one of the steps of the computation; here, with memory 
description we mean an instantaneous description of the chosen execution model. 
^^Now we use the term memory location a synonym of memory address. Basically, with location 
we mean a tag that can be used to identify the information stored in the memory description. 
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Let 



OP : p(Mcm) X Ext p(Mem) 
be defined as follows. Let M C Mem and e G Ext, then 

OP(M, e) =^ { OP(m, e) \ m e M A OP(m, e) is defined }. 



Example 1. Consider the modifications to the memory triggered by the declara- 
tion of a local variable. To formalize this event we introduce an operation NEWg, 
which takes the memory description of the execution prior to the declaration, plus 
some information about the declaration. In this case, the set "Ext ' represents the 
type of the declared variable and, if present, the expression used as initializer. The 
returned memory describes the properly updated execution state. Now suppose that 
the set M C Mem represents the possible memory configurations at a given program 
point p, which is immediately followed by a local variable declaration. Let e G Ext be 
the information associated to the declaration; then we express the set of all possible 
memory configurations resulting from the declaration as neWs(M, e). 

2.3 The Abstract Interpretation Approach 

As shown in the introduction, the aliasing problem is undecidable. Following the 
approach proposed by the abstract interpretation theory [CC77; CC79; CC92], to 
overcome this limitation we proceed by developing a computable approximation of 

the execution model and its operations. 

Definition 2.1. (Concrete domain of the aliasing problem.) We define 
the concrete domain of the aliasing problem as the complete lattice generated by the 
powerset of Mem 

(p(Mem), C, u, n, 0, Mem) 

Then we need to develop an abstract counterpart of the chosen execution model — 
an abstract domain Mem" that provides an approximation of the concrete domain 
p(Mem). We formalize Mem" as a complete lattice 

(Mem",C,U,n,_L,T). 

To formally express the semantics of the approximation we provide a concretization 
function 

7; Mom' ^ p(Mcm). 

We say that a memory description m G Mem is approximated, or abstracted, by an 
element of the abstract domain m" G Mem" when m G 7(m"). The formalism also 
requires the definition of an abstract counterpart Op" of the concrete operations OP 

op" : Mem" x Ext Mem". 

To prove the soundness of the proposed abstract model by it is necessary to show 
that for all m" G Mem" holds that 



that is, the approximation provided by the abstract operation Op" is safe with 
respect to the concrete operation OP. Beyond the operations already defined on 
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the concrete execution model^^ Mem, the formalization of the abstraction requires 
the definition of other operations that can be described as 

op' : (Mem")" x Ext ^ Mem«; 

along with the corresponding concrete counterpart, 

OP: p(Mem)" x Ext —>■ p(Mem). 

The soundness of these operations is expressed in the same way, that is for all 
mf e Mem* 

op(7(m}), . . . , 7(ml), e) C 'y{op^{m{, ...,ml, e)) . 

These additional operations include for instance, the 'meet' and 'join' operations of 
the domain. With a slight change of notation, this definition can be accommodated 
to describe also the requirement of correctness on the partial order 'C', i.e.. 



2.4 Queries 

This section introduces the concept of query on a domain. A query defines an 
interface on the domain, it helps to isolating the relevant information from other 
uninteresting details. When the analysis process is composed by more abstract 
domains, the use of queries is useful to formalize the interactions between them. 
More details on this approach can be found in [CLV94]. In the following we show 
how queries can also bo used to formalize the semantics of the abstraction, that is 
how the concretization function 7 can be expressed in terms of queries. Fixed the 
number of arguments n, we denote with 'Query' the space of the concrete query 
functions and with 'Query*' the space of the abstract query functions. 

Query =^ (Expr)" ^ Bool; 

Query* = (Expr)" ^ Bool*. 

The concrete query domain is then defined as the complete lattice generated by the 
powerset of 'Query' 

(p(Query), C, n, U, 0, Query); 

whereas the abstract query domain is defined as a complete lattice on the set Query*, 

(Query*, C,n,U,±,T), 

where 'C' is the point-wise extension of the ordering of Bool*; _L and T are the 
minimum and maximum elements of Query* with respect to this ordering, respec- 
tively; 'n' and 'U' are the obvious point-wise extensions of Bool* 's operations. Note 
that 'Query' can be seen as subset of 'Query* ';^^ from this fact, the concretization 



Such as the news, the assignment and all other operations required to define the behaviour of 
the concrete execution model. 

^^Consider indeed the injection /: Query — > Query" that maps every q 6 Query to a q" € Query" 
such that, for all e e Expr", Q"(e) = {Q(e)}. 
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Mem' 



Query ^— — Query" 



Fig. 1: A representation of the three steps required to define the semantics of an abstract domain 
using queries. 



function 

7: Query* p(Query), 
is defined as, for all Q £ Query and G Query", 

Qe7(Q«) 44 qcq". 

In order to define the semantics of Mem" in terms of queries, it is necessary to 
describe other two steps of the concretization. First we have to define how the 
query has to be performed on the concrete domain, that is how to extract the 
relevant information from a concrete memory. In symbol, 

7: Query p(Mem). 

Also, we have to define how the query has to be performed on the abstract domain, 
i.e, 

7: Mem" — > p(Query"). 

The semantics of the abstraction Mem" is then defined as the composition of these 
three steps (Figure 1.) 

2.4.1 The Alias Query. The following definitions present the formal meaning of 
the statement — gq and ei are aliases in m G Mem. Basically, two expressions are 
considered aliases in a memory description when they evaluate to the same memory 
location. 

Definition 2.2. (Concrete alias query domain.) Let 

AliasQ =^ (Expr x Expr) Bool. 

We define the concrete alias query domain as the complete lattice generated by the 
powerset of AliasQ 

(p(AliasQ), C, u, n, 0, AliasQ). 

Definition 2.3. (Concrete alias query semantics.) Let 

7: AliasQ p(Mem) 
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1 int a , *p ; 

2 p = &a ; 

3 a = rand ( ) ; 



Listing 13: in this example the execution can reach line 4 in many different states 
due to the different values that the variable 'a' can assume. However, considering 
a type based alias analysis — that is assuming that the analysis tracks only the 
value of pointer variables — each of the possible m S Mem carries the same aliasing 
information. 



be defined as follows. Let ALIAS € AliasQ and m G Mem; then we define m e 
7(alias) when, for all e, / G Expr holds that 

/ .X |l: jf EVAL(m, e) = EVAl(to, /); 
ALiAS(e,/) = <^ ' •' ^ ' ' V 
I 0, otherwise. 

Given a concrete memory description m £ Mem, we denote as ALIAS™ the concrete 
alias relation that abstracts m; also we call ALIAS™ the alias information of the 
memory m. As anticipated in Section 2.4, the ahas query ALIAS acts as an interface 
onto m e Mem selecting the interesting details; this idea is shown in Listing 13. 

Definition 2.4. (Abstract alias query domain.) Let 

AliasQ" (Expr x Expr) Bool". 

We define the abstract alias query domain as the complete lattice generated by the 
powerset of AliasQ" 

(AliasQ", □,U,n,_L,T). 
The semantics of the abstract alias query domain 

7: AliasQ" p(AliasQ) 
as already specified in Section 2.4, is defined as 

ALIAS e 7(ALIAs") ALIAS C ALIAs". 

The last step required in order to complete the definition of the semantics of the 
abstraction, that is from Mem" to p(ALlAs") (Figure 1), depends on the details of 
the chosen approximation method Mem". The next section presents some of the 
available approaches. 

2.5 Representation of the Abstract Alias Domain 

By looking forward to the realization of an alias analyzer, another problem arises. 
A realistic implementation cannot aim to directly represent abstract alias queries 
(Definition 2.4). As demonstrated in Listing 6, there can be an infinite number 
of ahasing pairs making impossible a direct representation. In this sense, the do- 
main Mem" introduces an additional layer of abstraction providing a representation 
suitable for the implementation. 
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Concrete memory 
m £ Mem 







Concrete alias query 
ALIAS G AliasQ 



Abstract alias query 
ALIA s" e AliasQ" 



Access-path based 
abstraction (Deutsch) 



Points-to based 
abstraction. 



Abstract memory y- 
m" G Mem' 



Fig. 2: A representation of the abstraction relations under discussion. Arrows should be read as 
'is abstracted by.' 



1 struct List { 

2 struct List *n; 

3 int key ; 

4 } *x; 

5 

6 Struct Tree { 

7 struct Tree *1 , *r; 

8 int key ; 

9 } *y ; 



Listing 14: in this code two recursive structures, List and Tree, are defined. 



2.5.1 Techniques For Approximating the Alias Information. In the hterature, 
different classes of methods exist. One of these is the class of access-path based 
methods. A brief description of an access-path based method is reported below. 
Another class is identified by the name of store based methods; more details on 
these are presented in Section 2.6. 

2.5.2 A Notable Example of Access-Path Based Approximation. In the litera- 
ture, the term access-path is used to design a simplified form of language expres- 
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sions. A notable example of access-path based method for the approximation of the 
abstract alias query domain (Definition 2.4) is presented in the Ph.D. dissertation 
of A. Deutsch [Dcu94]. In this proposal the elements of the abstract domain Mem" 
are formalized as pairs m} = {P, C) where P is a set of pairs of symbolic access paths 
and C is a set of constraints on P. A symbolic access path is an approximation 
of a set of expressions;^'* the concretization of a symbolic access path is defined 
using a mechanism similar to regular expressions. Consider for instance the code 
presented in Listing 14, and let (P, C) G Mem" be an abstract memory description 
of the program at line 11, such that C = {i = j}. The set of constraints C has the 
set of solutions 

in the variables Let p G P be the pair of symbolic access paths 

p =^ (x(->n)*->key, y(->l, ->ry->key). 

The semantics of p is a set of pairs of concrete expressions and it can be computed 
by replacing the occurrences of the variables i and j found in the symbolic access 
paths of p, with the values given by the solutions of C. For instance, by replacing 
the occurrences of the index 'j' with the integer 2 in the symbolic access path 

y(->l,->r)^->key, 
we obtain the regular expression 

y(->l,->r)2->key, 
that can be finally translated into the following set of expressions 

{y->l->l->key, y->r->l->key, y->l->r->key, y->r->r->key}. 

Depending on the considered solution of C, the pair p approximates different sets 
of alias pairs. For instance, using the solution (0,0) we have 

(x(->n)"->key,y(->l,->r)°->key) = {(x->key, y->key)}. 

With the solution (1,1) we have 

(x(->n)i->key,y(->l,->r)i->key) 

= {(x->n->key,y->l->key), (x->n->key,y->r->key)}. 

Using the solution (2, 2) we obtain 

(x(->n)2->key, y(->l, ->r)2->key) 

= {(x->n->n->key, y->l->l->key), (x->n->n->key, y->r->l->key), 

(x->n->n->key, y->l->r->key), (x->n->n->key, y->r->r->key)}. 

Generally, C is a set of constraints on a tuple of indices / = {ii, . . . , i,, }- The indices 
of / also occur in the symboHc access paths of P. To each solution S": / — > N 



"'■'The term symbolic access path comes from the original paper [Deu94] and it actually means an 
abstraction of the concept of expression. With our notation, the term abstract expression would 
be probably used instead. 
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of C corresponds a different alias query expressed as a set of pairs of (concrete) 
expressions. Given a solution S to C, the corresponding alias query, say P{S), can 
be obtained from P by replacing every occurrence of the index ik in P with the 
solution S{ik), for each index ik of /. As shown above, this replacement yields a 
set of pairs of no-longer-symbolic access paths. Seen as regular expressions, these 
no-longer-symbohc access paths are transformed in a set of pairs of expressions. 
The semantics of Mem" can be finally expressed in terms of queries as follows. Let 
ALIAS* e AliasQ" and {P,C) e Mem". We say that ALIAS* e 7((P, 5)) when 

35 solution of C . Ve, / e Expr : ALlAS*(e, /) G {T, 1} =^ (e, /) e P{S). 

Note that this has two main consequences. 

— This formulation is unable to represent definite alias properties, that in terms of 
abstract ahas queries correspond to the answer '1'; the approximation provided 
by this method is indeed also called may-alias information. For example, at line 3 
of Listing 7, in all of the possible executions, the expression '*p' is an alias of 'a'. 
However, this method is only able to tell that '*p' is possibly an alias of 'a', that 
in terms of abstract alias query corresponds to the outcome T. 

— Every solution of C corresponds to a different abstract alias query, whereas, as 
we will show in Section 4, the concretization of a points-to abstraction consists of 
only one abstract alias query. As a consequence, this representation of the alias 
information is able capture relational information, whereas points-to methods 
cannot. 

To represent the set of integer constraints C different options exist. The literature 
on this field provides a wide choice of numeric lattices offering different trade-off 
between accuracy and efficiency; from non relational domains — like arithmetic 
intervals and arithmetic congruences — up to relational domains [BHZ08]. The 
alias analysis just described is completely parametric with respect to the chosen 
numeric domain and — due to the large availability of numeric domains — this is a 
point of strength of the method. 

2.6 The Store Based Approach 

This section introduces some concepts that are useful to understand the approach 
of store based methods. Points-to analyses are special cases of stored-based methods. 
The idea common to all store based methods is the exphcit introduction of formal 
entities to represent memory locations. As in the concrete situation we use the 
notation 'Loc' to represent the set of the memory locations; now we introduce the 
notation Loc" to denote the set of the abstract locations. Store based information 
usually consist of some sort of compact representation of a binary relation 'P' on 
the set of the abstract locations. To bind the concept of location to the concept 
of expression an environment function is provided. Basically, the environment 
function is needed to resolve identifiers into abstract locations. Denoting with 
'Identifiers' the set of identifiers, an environment function can be described as 

ID : Identifiers — > Loc" 
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1 


int **pp , *p , 


a ; 


2 


struct List { 




3 


struct List 


*n ; 


4 


int key ; 




5 


} *h; 




6 






7 






8 


h->key = h->n 


- >n- >key ; 



Listing 15: in this code the assignments at line 7 and line 8 contain expressions the 
dereference operator occurs more than once. 



Since identifiers are the base case for the definition of the Expr set, from the 
elements {P, id) it is possible to build the abstract evaluation function 

EVAL: (Mem* x Expr) — > p(Loc''). 

The 'eval' function is defined inductively following the inductive definition of the 
'Expr' set. The details depend on the chosen language and intermediate represen- 
tation; a complete definition is presented in Section 3. The 'eval' function is then 
used to define the semantics of Mem" in terms of abstract alias queries; for instance, 
a possible definition is the following. Let m" G Mem'* and alias" G AliasQ", we say 
that ALIAS* G 7(m") when, for all e, / G Expr holds that 



ALiAs''(e, /) = 



0, if EVAL(m", e) n EVAL(m'', /) = 
T, otherwise. 



This is an oversimplified definition, presented only to give an idea of how a store 
based approximation can answer to alias queries; note indeed that we have omitted 
to consider definite alias informations. Due to the introduction of the set of abstract 
locations Loc", the semantics of the abstract domain Mem" can also be expressed 
in terms of the value of locations: we have an abstraction function 

a: hoc Loc"; 

where, for each I G Loo, a{l) denotes the abstract location that approximates I. 
Given I G Loc" and an abstraction m" G Mem" , we denote as to" [/] the value of the 
abstract location I in the abstract memory description m". Now, let m G Mem and 
to" G Mem"; then we have 

TO G 7(to" : m[l] is defined =^ m[l] G 7 (to" 

This formulation of the semantics of Mem" can be applied to points-to methods only, 
but it has the advantage that it can be generalized to the case where the points-to 
domain is coupled with some other abstract domain, provided that its semantics 
can be expressed in the same way. Moreover, the concretization function expressed 
in terms of locations is more similar to the algorithms actually implemented as 
client analyses are more likely to reason in terms of "pointed locations" than in 
terms of "aliased expressions". 
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1 

2 


int * tmpO = *pp ; 




3 


*tmp = *p; 




4 


struct List *tmpl = 


h->n ; 


5 


struct List *tmp2 = 


tmpl - >n ; 


6 


h->key = tmp2->key; 





Listing 16: this is the simpUfied version of line 7 and Hne 8 from Listing 15. Note 
the use of the additional variables tmpO, tmpl and tmp2. Note that all expressions 
contain at most one occurrence of the dereference operator. 



1 int a, b, *p , *q, **pp; 

2 ... 

3 if (**pp == &a) { 

4 ... 

6 } 

Listing 17: an example of 'complex' expression occurring in the condition of an if 
statement. 



2.6.1 Practical Considerations on Store Based Methods. Despite the commonal- 
ities of store based methods described in the previous section, from the implemen- 
tation perspective many different options exist. For example Emami et al. [Ema93; 
EGH94] and also [Ghi95] do not define a complete abstract evaluation function 
EVAL. Instead, they prefer to work on a simplified version of the code. To accom- 
plish this they introduce a simplification phase to be performed before the actual 
analysis. Basically, this phase breaks the occurrences of "complex" expressions into 
a simpler form by means of the introduction of auxiliar variables and assignments. 
For example, in the simplified code all the expressions contain at most one occur- 
rence of the dereference operator. Listing 16 presents the result of the simplification 
phase applied to the code in Listing 15. Having reduced all the expressions to a base 
form, the definition of the evaluation function EVAL is greatly simplified. However, 
the simplification phase has also other side effects. First, assuming to have already 
proved the correctness of the analysis, its results are valid on the code resulting 
from the simplification phase; to obtain any formal result on the original code it 
must be proved that the applied simplification does not change the semantics of the 
code. From the point of view of the efficiency, it is unclear whether or not a simpler 
evaluation function EVAL allows a more efficient analysis. In both cases the same 
steps of evaluation must be made; the difference is that in one case temporaries are 
made explicit. In our approach we have chosen to avoid the simplification phase as 
we believe that enabling the analyzer to see complete expressions can improve the 
precision. 

Example 2. Assume that at line 3 of Listing 17 holds the following points-to 
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1 

2 


int a, b, +p , +q , ++pp; 




3 


int * temp = *pp ; 




4 


if (*temp == &a) { 




5 
6 


} 






Listing 18: the result of the 


simplification of Listing 17. 


1 


int a, b, c, d, *p, *q; 




2 


if (randO) { p = &a; q = fee; 


} 


3 


else { P = i 1 = ^^'> 


} 


4 







Listing 19: in this code only two possible executions exist. At line 4, knowing the 
value of one of the two pointers 'p' and 'q', it is possible to determine the value of 
the other. 



information: 

P{PP) = il^p}^ 

P{q)^{h}. 

Looking at the condition of the if statement at line 3, it is possible to refine the 
points to information of line 4; that is, inside the 'then' branch, 'pp ' points only 
to 'p'. However, on the simplified code (Listing 18), looking only at the simplified 
condition of the if statement, it is not possible to infer any useful information 
about 'pp as it occurs no more in the expression. It is possible to prove that 't emp ' 
points only to 'a', but this information is useless as 'temp' is a auxiliar variable 
introduced by the simplification phase and thus it is not used elsewhere. 

2.7 Precision Limits of the Alias Query Representation 

This section presents an example that highlights the limitations of the alias query 
representation; alias queries (Section 2.4.1) fail to represent relational information. 
For instance, the code presented in Listings 19 and 20 induce the same abstract 
alias query; in particular in Listing 19 the alias representation is unable to express 
that, at line 4, if 'p' points to 'a' then 'q' points to 'c'. This situation is illustrated 
in Figures 3 and 4. 

3. THE ANALYSIS METHOD 

This part of the work is meant to be as much self-contained as possible. The 
aim of this sections is to present few simple but formal definitions of a simplified 
but general memory model and, on these, build the algorithms and prove their 
correctness. 
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1 


int a , b , c 


, d, 


*p, *q; 


2 


if (randO) 


P = 


&a ; 


3 


else 


P = 


ftb; 


4 








5 


if (randO) 


q = 


ftc ; 


6 


else 


q = 


&d; 


7 









Listing 20: in this code four executions are possible; at line 7, also knowing the 
value of one of the two pointers 'p' and 'q', it is not possible to determine the value 
of the other. 



3.1 The Domain 

Let £ be a given set that we call the locations set and whose elements are called 
locations. 

Definition 3.1. (Abstract and concrete domains.) We call support set of 
the concrete domain the set C of the total functions from C to C; we call support 
set of the abstract domain the set A of the binary relations on the set C 

C C^C; 
A''^' p{CxC). 

We define the concrete domain as the complete lattice generated by the powerset of 
C 

(p(C),c,u,n,0,c). 

We define the abstract domain as the complete lattice 

{A,c,u,n,$,c X c). 

Note that from the above definition we have that C <^ A. Though we use the 
same notation for the operations of the two lattices they obviously have different 
definitions. For the abstract domain the partial order 'C', the operations 'U' and 
'n' are referred to sets of pairs of locations; whereas for the concrete domain they 
are referred to sets of functions C ^ £. The semantics of the abstract domain 
is defined using the fact that C C A and the partial order 'C' on sets of pairs of 
locations. 

Definition 3.2. (Concretization function.) Let 

j:A^p{C) 

be defined, for all A G A, as 

7(A) =^ {C e C I C c A}. 

Now we present some definitions useful to define how we navigate the poinst-to 
graph. 
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[ P ] [ ^ ] Below an extract of the concrete alias query 

ALIASmn 

induced by the concrete memory de- 

scription mg S Mem. Above a graphical represen- 

ALiASmg abed tation of the points-to information associated to 

*^ i the same memory. 

*q 10 







As above, on the concrete memory description 





ALIASmj 


a 


b 


c 


d 


*P 





1 








*q 











1 



mi . 









The abstract alias query 

alias" e AliasQ" 

is defined as ALiASmg U ALiASmi and it is the 
most precise abstract alias query that abstracts 
the set of concrete alias queries 



*p T T 

*q T T {ALIASmo, ALIASmi} 



Fig. 3: a representation of the alias query induced by the code in Listing 19. 



Definition 3.3. (The prev and post functions.) Let 

PREV, POST: ^ X £ — > p{C) 

be defined, for all A £ A and I € C, as 

PREv(A, I) '^^ {m e C \ (m,!) e A}; 
post(A, { TO e £ I (/, m)eA}. 

For convenience we generalize the definition of the POST and PREV functions to 
sets of locations. 



32 • S. SofFia 





\ / 

O 



*p 10 

*q 1 



An example of a spurious element of the con- 
cretization of alias" . We have that 

ALIASmj € 7(aLIAs''). 

However, ALiASm2 can not be generated by the 
program. 











*P 
*q 







Another spurious element of the concretization of 
alias". Note that: 

7(aLIAs'') = 7(ALIASmo U ALIASmj ) 
= { ALIASmo , ALIASmj , 
ALIASma: ALIASmg}. 



Fig. 4: continuation of Figure 3. 



Definition 3.4. (Extended prev and post functions.) Let 

PREV, POST: A X p{C) p{C) 

he defined, for all A ^ A and L <Z C, as 

PREV(A,L) =^|J{PREV(A,0 \1<eA}; 

post(A, L) =^ y { post(A, I Z e A } . 
3.2 The Language 

In this section we present a simple language to model the points-to problem. 

Definition 3.5. (Expressions.) We define the set Expr as the language gen- 
erated by the grammar 

e ::— I \ * e 

where I G C and * ^ C is a terminal symbol. 
Definition 3.6. (Evaluation of expressions.) Let 

EVAL : A X Expr —f p{C) 
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be defined inductively on Expr (Definition 3.5). Let A A, I ^ C and e G Expr; 
then we define 

eval(A,/) =^{0; 

eval(A, * e) =' post(A, eval(A, e)) . 

Not necessary for the goal of this section, for completeness we report the con- 
cretization of the points-to abstract domain in terms of abstract alias queries. 

Definition 3.7. (Induced alias relation.) We define 

7: AliasQ" 

as follows. Let A £ A, then let ^{A) =^ alias", where, forall e, / £ Expr, we have 
E eval(^, e); 



='eval(A/); 



alias" (e, /) 



0, ifEnF = 0; 

1, ifE = FA#E=l- 
T, otherwise. 



Definition 3.8. (Conditions.) We define the set of conditions as the set 

Cond '= {eq, neq} x Expr x Expr 
Definition 3.9. (Value of conditions.) Let 

TnicCond C C x Cond 

be a set defined, for all C G C and e, / G Expr, as 

(C, (eq, e, /)) G TrueCond <^ EVAl(C, e) = EVAl(C, /); 

(C, (neq, e, /)) G TrueCond 44 (C, (eq, e, /)) ^ TrueCond. 

Let C € C and let c G Cond, for convenience of notation we write C \= c when 
(C, c) G TrueCond. We also introduce the function 

MODELSET: Cond p(C), 

defined, for all c G Cond, as 

modelset(c) =^{CgC|C|=c}. 

In other words, 'modelset(c) ' is the set of the concrete memory descriptions where 

the condition c is true. 

3.2.1 Assignment 

Definition 3.10. (Assignment evaluation.) We define the set of assign- 
ments as 

Assignments '= Expr x Expr 
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Let 

ASSIGN : A X Assignments A 
he defined as follows. For all A ^ A and e, / G Expr, let 

ASSIGN (A, (e, /)) eval(A, e) X eval(A, f)\J{A\ K) 
where, the kill set K is defined as 



The following lemma shows that the 'ASSIGN' function just described, defines also 
the concrete semantics of the assign operation, i.e.' performing an assignment on 
an element of the concrete domain yields another element of the concrete domain. 

Lemma 3.11. (Restriction of the assignment to the concrete domain.) 

The set C is closed with respect to the function ASSIGN, that is, for all C € C and 
a e Assignments holds that ASSIGn(C, a) e C. 

Therefore, the function ASSIGN restricted to C can be written as 

ASSIGN: C X Assignments C. 

In other words, the formalization of the assignment operation given in Defini- 
tion 3.10 is a generahzation of the concrete assignment behaviour. At this point, 
we define the concrete semantics of the assignment. 

Definition 3.12. (Concrete assignment operation.) Let 

ASSIGN : ip{C) X Assignments p(C) 
defined, forall D CC and a € Assignments, as 

ASSiGN(i:),a) assign((7, a) \ C G D}. 
3.2.2 Filter 

Definition 3.13. (Concrete filter semantics.) Let 
(p: p{C) X Cond ^ p(C) 
he defined, for all D C C and c G Cond, as 

4>{D, c) modelset(c) n D. 

In other words, given a set D of concrete memory descriptions and a boolean 
condition c we denote with c) the subset of D of the elements in which the 
condition c is true. We proceed in the definition of the abstract filter operation. 
Since we want to track step by step the evaluation of expressions, we extend the 
definition of the EVAL function to allow this. 

Definition 3.14. (Extended eval function.) Let 

EVAL : A X Expr x N ^ p(£) 




def J eval(A, e) X £, «/ # eval(A, e) = 1; 



otherwise. 
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be inductively defined as follows. Let A G A, I G jC, e € Expr and i G N; then we 



Definition 3.15. (Target function.) We define the function 

TARG: a X p{C) X Expr x N — > p{C) 
inductively as follows. Let A G A, M C C, e G Expr and i G N; then we define 
targ(^, M, e, 0) eval(A, e) n M; 

targ(A, M,e,i + 1) =^ eval(A, e, z + 1) D prev(^, targ(A, e, i)) . 

Definition 3.16. (Filter 1.) Let 

(/): Ax p(£) X Expr x N ^ >l 

be defined as follows. Let A G A, M C C, e G Expr and iGN. For convenience of 
notation let x = {A,M,e); then we define 

Hx,0)=A; 

T =^ TARG(a;,i + 1); 



Definition 3.18. (Filter 3.) Let 

4>: Ax Cond A 
be defined as follows. Let e, f G Expr, and let 
I =^ eval(^, e) n eval(A, /); 
E ^= eval(A, e) \ eval(A, /); 
F eval(A, /) \ eval(A, e). 



define 



eval(A, e, 0) =^ eval(A, e); 

EVAL(A,/,i + l) 1^^0; 

eval(A, * e, z + 1) =^ eval(A, e, i). 




Definition 3.17. (Filter 2.) Let 

(I): Ax p{C) X Expr -> A 
be defined, for all A G A, M C jC and I G C, as 




(l){A, M, * e) = Pi (piA, M, * e, i). 
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1 int **pp , *q, *p , *r , a, b, c; 

2 

3 if (...) pp = &p ; 

4 else pp = &q; 

5 // EVAl(*pp) = {p,q} 

6 

7 if (...) r = &a; 
seise r = &c; 

9 // eval(* r) = {o, c} 

10 

11 p = &a; 

12 // eval(*p) = {a} 

13 q = &b ; 

14 // eval(* q) = {b} 

16 

16 *pp = r; 

17 // eval(**pp) = EVAL(*g) = {a,b,c} 

18 // eval(*p) = {a, c} 

Listing 21: an example of application of the assignment operation. 




Before. After. 
Fig. 5: a representation of the points-to information of the program in Listing 21 before and after 
the assignment at line 16. 



Then, for all A £ A, we define 

(eq, e, /)) /, e) n /, /), 

, fcl>{A,E,e)U^{A,FJ), if # I = 1; 

<^(A, (neq,e,/)) = <^ . 

I A, otherwise. 

3.3 Examples 

This section presents some examples to illustrated how the model just presented 
works. 

Example 3. This example is about the abstract assignment operation. Consider 
the code in Listing 21. Note that the C assignment '*pp = r' in our simplified 
language is expressed as the pair {*pp,*r). Assume to reach line 15 with the 
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1 int **pp , *p , *r , a, b, c; 

2 

3 pp = &p; 

4 p = &c ; 

6 // eval(**pp) = {c} 

6 if (...) r = &a; 

7 else r = &b ; 

8 / / EVAl(* r) = {a, b} 

s 

10 *pp = r; 

11 // EVAl(*pp) = {a, 6} 

Listing 22: another example of application of the assignment operation. 

approximated points-to information A £ A 

EVAL{A,*pp) = {p,q}, 
eval{A,*p) = {a}, 
EVAl(A, = {&}, 
eval(A, *r) = {a,c}; 

then 

eval{A,*pp) X EVAL(A,*r) = {(p,a),(p,c),((j,a),((7,c)}. 

The result of the evaluation of the rhs of the assignment, * pp, contains more that 
one locations, p and q; then from the definition of the assignment operation (Defi- 
nition 3.10) we have that the kill set K is empty, then the result of the assignment 
can he expressed as 

ASSIGN (A, *r)) = A U EVAL(A, *pp) X EVAL(A, * r) 

= Au{(p, a),{p, b),{q,a),{q,b)}. 

Note that after the execution of the assignment (Figure 5), the old values of the 
variables 'p' and 'q' are not overwritten, i.e., 

{{p,a),{q,b)} C ASSIGN (A, {*pp,*r)). 

Example 4. This is another example of the application of the abstract assign- 
ment operation. Consider the code in Listing 22. Again, the C assignment '*pp = 
r' in our simplified language is expressed as the pair {*pp,*r). Assume to reach 
line 9 with the approximated points-to information A £ A such that 

eval(A, *pp) = {p}, 
eval(A, *p) = {c}, 
EVAl(A, *r) = {a, 6}; 

then 

eval(A, X eval(A, *r) = {(p, a), (p, 6)}. 
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Before. After. 
Fig. 6: a representation of points-to information before and after the execution of the assignment 
operation at line 10 of Listing 22. 



1 int *p , *q , a , b , c ; 

2 

3 if (...) p = fta; 

4 else p = &b ; 

s II eval(*p) = {a, 6} 

6 

7 if (...) q = &b ; 

seise q = &c; 

9 // EVAl(* q) = {fe, c} 

10 

11 if (p == q) { 

12 // eval(*p) = EVAL(*g) = {&} 

13 } 

Listing 23: an example of application of the filter operation. 



But t,his time the evaluation of the rhs of the assignment, *pp, contains only one 
location, p. From (Definition 3.10) we have 

K = eval(A, *pp) X £ = { (p, c) } , 
and then (Figure 5) 

ASSIGN (A, *r)) = {A\K)\J YNAh{A,*pp) X eval(A, *r) 
= (A\{(p,c)})u{(p,a),(p,6)}. 
Note that, in this case, the assignment deletes the old value of the variable 'p', i.e., 

{p, c) ^ ASSIGN (A, {*pp,*r)). 

Example 5. Consider the example program in Listing 23. As anticipated in the 
annotations of the presented code, the filter operation, acting on the condition 'p 
== q', is able to detect that inside the body of the if statement at line 12 both 
'p ' and 'q ' definitely point to 'b '. Now we want to show step by step how this 
result is obtained from the given definitions. Since the situation for 'p ' and 'q ' is 
symmetrical, we show only how it can be derived that 'p ' definitely points to 'h '. 
Recall that the boolean expression of the C language 'q == p' corresponds, in our 
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0-0-0 




Before. After. 
Fig. 7: a representation of the points-to information before and after the execution of the filter 
operation on the condition of the if statement at line 11 of Listing 23. 



eval(1) eval{0) 



targ(1) targ(O) 



Fig. 8: a representation of computation of the filter operation for the example in Listing 23. 



simplified language, to the triple {eq,*p,*q). Assume now that line 10 is reached 
with the following approximated points-to information A E A (Figure 1) 

eval(A, *p) = {a, 5}, 

EVAL(yl, * (?) = {fe, c}. 
From the definition of the abstract filter operation (Definition 3.18) we have 

I — eval(A, *p) n EVAL(yl, * q), 

(eq, *P, = (j){A,I,*p)r\(j){A,I,'^q). 
The evaluation of the expressions is illustrated by the following table. 



i EVAL{A,*p,i) 


EVAL(A, * q, i) 


i {p} 


{q} 


{a, 6} 


{fe,c} 



For i = 0, the target set of the filter (Definition 3.17) is then defined as 

I ~ eval(A, *p) n EVAL(yl, * q) 
= eval(A, * p, 0) n eval(A, * (J, 0) 
= {a,6}n{5,c} = {6}. 
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1 


int *p , *q , a , b , c , 


d ; 


2 
3 


if (...) 




4 


if (...) p = &a; 




5 


else p = &b ; 




6 


else 




7 


if (...) p = &c ; 




8 


else p = &d; 




9 


/ / eval(* p) = {a, b, c, d} 




10 
11 


if (...) q = &b; 




12 


else q = &c ; 




13 


// eval(*(?) = {b,c} 




14 
15 


if (p == q) { 




16 


// eval(*p) = EVAL(*g) 


= {6,c} 


17 


} 





Listing 24: an example of application of the filter operation. 




Before. After. 
Fig. 9: a representation of the points-to information before and after the execution of the filter 
operation on the condition of the if statement at line 15 of Listing 24. 

Then, recalling from Definition 3.16 that 

targ(A, /, e, i + 1) = eval(A, e, z + 1) n PREv(A, targ(A, /, e, i)) , 
we compute backward the sequence of target sets for the expression *p as 



i 


targ(A, {&}, *p, i) 


Removed arcs 


1 


{p} 







{b} 






Since the target set for i = \ consists of the only element p and the node a is 
not part of the target set for i — 0, then the filter removes the arc (p, a) from the 
points-to information. See Figure 8 for a graphical representation of the described 
situation. 



Definition and Implementation of a Points- To Analysis ■ 41 




Fig. 10: a representation of the computation of the filter operation for the example in Listing 24. 



Example 6. Now we present a similar situation to show that when the abstract 
filter operation cuts some arcs (Definition 3.16) what matters is the cardinality 
of the set of the "pointers" and not carditality of the set of the "pointed" objects. 
Consider the code in Listing 24; the points-to information A ^ A at line 14, is 
presented in Figure 9. In this case the evaluation of the two expressions * p and * q 
proceeds as follows 



i EVAL{A,*p,i) 


eval(A, * q, i) 


1 {p} 




{a, b, c, d} 


{b,c} 



For i = 0, we have the target set 

eval(A, *p) n eval(A, *q) = eval(A, *p,0)n eval(A, * q,0) 

= {a, 6, c, d} n {5, c} — {b, c}. 

The computation of the filter on the expression 'p' proceeds as follows (Figure 10) 



i TARG(A,{b,c}.*p,i) 


Removed arcs 


1 {P} 


{{P, a), {P, d)} 


{fc,c} 






Example 7. Consider the points-to approximation A A described in Figure 11. 
In this case there are two levels of indirection. Assume to filter the points-to ap- 
proximation A with respect to the condition (eq, * * pp, a) . The evaluation of the Ihs 
and the rhs of the condition proceeds as follows 



i EVAL{A,**pp,i) EVAL(A, a, i) 



2 {pp} 

1 {p,q} 

{a,b,c} {a} 
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Before. After. 
Fig. 11: a representation of the points-to information A G A before and after the execution of the 
filter operation on the condition (eq, **pp,a). 



eval(2) eval(1) eval(O) 




Fig. 12: a representation of computation of the filter operation for the example in Figure 11. 

Then, for i ~ 0, we have the target set 

eval(A, **p) n eval(A, a) = eval{A, **p,0) D eval(A, a, 0) 

= {a, 6, c} n {a} — {a}. 

The computation of the filter on the Ihs proceeds as 



i 


TARg(^, {a}, * H<p, i) 


Removed arcs 


2 


{PP} 


Um q)} 


1 


{P} 







{a} 






Figure 12 depicts the computation just described. 



Example 8. Consider the points-to approximation A <^ A described in Figure 13. 
The evaluation of the the expression **pp follows the steps 
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Original. Equality. Inequality. 

Fig. 13: on the left a representation of the initial points-to information A £ .4, in the middle 
the information resulting by filtering the initial information A on the condition (eq, **pp,a); 
finally, on the right, the points-to information resulting from filtering the approximation A on the 
condition (neq, * a). 



eval(2) eval(1) eval(O) 




Fig. 14: a representation of computation of the filter operation for the example in Figure 13 on 
the condition (eq, **pp, a). 



i 


eval(A, **p,i) 


2 


{PP} 


1 


{P,^,^} 





{a,b} 



Assume to filter the points-to approximation A on the condition (eq, a) and 

also on the opposite condition (ncq, * * pp, a) . For i — Q, for the equality and the 
inequality conditions we have the target sets 

EVAL(A, * n EVAL(yl, a) = {a, b} n {a} = {a}, 
EVAL(yl, **pp) \ EVAl(A, a) — {a, b} \ {a} — {&}, 



respectively. The computation of the filter on the Ihs proceeds as 
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eval(2) eval(1) eval(O) 




Fig. 15: a representation of computation of the filter operation for the example in Figure 13 on 
the condition (neq, * *pp, a). 




Before. After. 
Fig. 16: on the left a representation of the points-to approximation A £ A, on the right a 
representation of the approximation resulting from the application of the filter on the condition 
(eq, * * *ppp, a). The arcs {{ppp, rr), (p, b)} have been removed. 



i 


targ(A, {a}, **pp, i) 


Removed arcs 


2 


{PP} 


{{pp,r)] 


1 










{a} 





i 


targ(A, {b}, **pp, i) 


Removed arcs 


2 


{pp} 


{{pp,p), {pp,q)} 


1 


{r} 














Figure 14 and Figure 15 depict the filter computation just described. 



Example 9. Now consider the points-to approximation A <E A described in Fig- 
ure 16. The evaluation of the the expression * * *ppp follows the steps 
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eval(3) eval(2) eval(1) eval{0) 




Fig. 17: a representation of computation of the filter operation for the example in Figure 16 on 
the condition (eq, * * *ppp,a). 



i 


eval(A, **p,i) 


3 


{ppp} 


2 


{pp, qq, rr} 


1 


{P, r} 





{a,b, c} 



Assume to filter the points-to approximation A on the condition {eq,***ppp,a). 
For i = 0, for the equality condition we have the target set 

eval(A, * * *ppp) n eval(A, a) = {a, b, c} n {a} = {a}. 

The computation of the filter on the Ihs proceeds as follows 



i 


targ(^, {a}, * **ppp, i) 


Removed arcs 


3 


{ppp} 


{{ppp,rr)} 


2 


{pp, qq} 





1 


{p} 


{iP,b)} 





{a} 






Figure 17 depicts the filter computation just described. 
3.4 Results 

3.4.1 Notation. The proofs are organized as sequences of deductions, for conve- 
nience of notation presented inside tables. Each table is organized in three columns: 
the first column contains the tag used to name the step; the second column contains 
the statement and the third column contains a list of tags that represents the list 
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of statements used to infer the current row. There are three kind of tags. The 
first kind of tag, denoted as 'TS', is used to mark the thesis, which, if explicitly 
presented, occurs always in the top row. The second kind of tag is used to describe 
the hypotheses, marked as 'HO', . . . , 'Hn'. Among the hypotheses we improperly list 
the lemmas used in the proof. The third kind of tag is used to describe deductions, 
displayed as 'DO', . . . , 'Dn', with the exception of the last deduction step, which is 
tagged with the symbol '>h'. Within the table, the hypotheses are displayed below 
the thesis and deductions below the hypotheses. To stress the separation of the 
thesis from the hypotheses and of the hypotheses from the deductions horizontal 
line are used. Deductions, between themselves, are sorted in topological order, such 
that, if the deduction 'Dm' requires the deduction 'Dn', then m > n. When the 
proof consists of more cases, then multiple tables are used; in this case, an initial 
table containing the hypotheses common to all cases may be present. Cases are 
marked as 'CI', . . . , 'Cn'; if an hypothesis comes from considering the case 'Cn', 
then the tag 'Cn' is also reported in the third column of the corresponding row. In 
inductive proofs, the inductive hypothesis is marked with a '(ind. hyp.)'. 

3.4.2 Concrete Assignment. We start by showing that the assignment operation 
is closed with respect to the set of the concrete memory descriptions C. 

Lemma 3.19. (Eval cardinality on the concrete domain.) Let C ^ C and 
e G Expr, then # eval(C, e) = 1. 

Proof. Let C £ C. We proceed by induction on the definition of the set Expr 
(Definition 3.5). 



TS #EVAL(C,e) = l 

HO Definition 3.6, the eval function. 



For the base case let e = I G C. 



TS 


#eval(C,0 = 1 




DO 


eval(C, I) = {1} 


(HO) 




# EVAL (CO = 1 


(DO) 



For the inductive case let e € Expr. 



TS 


# eval(C, * e) = 1 




HI 


#EVAL(C,e) = 1 


(ind. hyp.) 


H2 


Definition 3.1, the concrete domain. 




H3 


Definition 3.3, the post function. 




DO 


eval((7, * e) = post(C, eval(C, e)) 


(HO) 


Dl 


yieC: #{(/,to) G C} = 1 


(H2) 


D2 


#{ (;, to) G C ; G eval(C, e) } = 1 


(HI, Dl) 


D3 


#post(C, eval(C, e)) = 1 


(D2, H3) 




#eval(C, *e) = 1 


(D3, DO) 



□ 
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Lemma 3.20. (Assignment on the concrete domain.) LetC G C ande,f G 
Expr. For convenience of notation, let a G Assignments such that a = (e, /). Let 

eval(C, e) = {I}; 
post(C, = {n}; 
eval(C, /) = {m}; 

then 

ASSiGN(C,a) =(c\{{l,n)}^ U {{l,m)}. 

Proof. Let C € C. First note that from the definition of the concrete domain 
(Definition 3.1) and Lemma 3.19, #EVAl(C, e) = #EVAl(C, /) = 1. From the 
definion of the post function (Definition 3.3) also #POSt(C, Z) = 1. Thus the 
above statement is well formed. 



TS 


ASSiGN(C,a) {(Z,n)}^ U {{l,m)} 




HO 


{1} = eval(C, e) 




HI 


{n} = post(C, 




H2 


{m} = eval(C, /) 




H3 


Definition 3.10, the assignment evaluation. 




DO 


# eval(C, e) = 1 


(HO) 


Dl 


assign(C, a) = eval(C, e) x eval(C, /) 






U (C \ eval(C, e) X C) 


(DO, H3) 


D2 


eval(C, e) X eval(C, /) = {{l,m)} 


(HO, H2) 


D3 


CnEVAL(C,e) xC = {{l,n)} 


(HI) 


D4 


C\EVAL(C,e) X C = C\{{l,n)} 


(D3) 


<^ 


ASSiGN(C,a) =(c\{{l,n)}^u{{l,m)} 


(D4, D2, Dl) 



□ 

Proof. (Restriction of the assignment to the concrete, Lemma 3.11.) 

This result is a simple corollary of Lemma 3.20. □ 

3.4.3 Observations on the Domain. First we present the following simple result 
about the monotonicity of the concretization function. 

Lemma 3.21. (Monotonicity of the concretization function.) Let A,B G 

A, then 

ACB^ ■y{A) C 7(B). 

Proof. Let A,BgA. If 7(^) = then the thesis is trivially verified. Otherwise 
let C e 'y{A), we have to show that C e 7(5) too. 



48 



S. Soffia 



TH 


C G j{B) 




HO 


C e j{A) 




HI 


Definition 3.2, the concretization function. 




H2 


ACB 




DO 


CCA 


(HO, HI) 


Dl 


C(eC 


(HO, HI) 


D2 


C CB 


(DO, H2) 




C G 7(5) 


(D2, Dl, HI) 



From the definition of the concrete and of the abstract domain (Definition 3.1) 
and the definition of the concretization function (Definition 3.2) we complete the 
description of the abstraction by presenting the abstraction function. 

Definition 3.22. (Abstraction function.) Let 

a: p(C) A 
he defined, for all C CC, as 

a{C) ^=i* U D. 
Dec 

It is possible to show that (p(C), a, .4., 7) is a Galois connection, that is, for ah 
C G C and ^ G ^, holds that: 

a{C) CA<^CC j{A). 

Indeed, given C C C and A G A the following steps are all equivalent 

a{C) C A, 

D DCA, 

Dec 

\/D e C : D C A, 
yDeC:De 7(A), 

CQ-f{A). 

On the presented abstraction holds also the following result. The following lemma 
shows that given a non-bottom abstraction a G A, then for each arc (Z,m) G A 
there is a concrete memory C abstracted by A that contains the arc {l,m). 

Lemma 3.23. (Concrete coverage.) Let A G A, then 

j{A) 7^ ^ V(/.Tn) eA:3CG 7(A) . {l,m) G C. 

Proof. Let A g A such that 7(A) 7^ and let {l,m) G A. Let C G 7(A), let 
n G £. 
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TS 


3D G MA) . (l,m) G D 




HO 


(Z,m) G A 




HI 


C G -i[A) 




H2 


SnX fL pnsT^ CI 1) 




H3 


Definition 3.2, concretization functio 


n. 


H4 


Definition 3.3, the post function. 




DO 


CCA 


(HI, H3) 


Dl 


C\{il,n)}CA 


(DO) 


D2 


{l,n) G C 


(H2, H4) 


D3 


(c\{{l,n)]\\j{(l,m)]&C 


(D2) 


D4 


(c\{{l,n)]\\j{{l,m)]cA 


(Dl, HO) 


D5 


(c\{G,n)})u{G,m)}G7(A) 


(D3, D4, H3) 




3£) G7(A) . (/,rn) G C 


(D5) 


3.24. 


(Abstraction effect.) Let A & A, 


then 



a (7(A)) C A- 

moreover 

7(A) ^ =^ a(7(A)) = A. 

Proof. Let A e A. Consider tliat 

HO Definition 3.22, the abstraction function. 
^0 a(7(^)) = Ucg-,(A) g (HO)" 

We proceed by showing the two inclusions separately. For the first inclusion let 
{l,m) G a (7(A)); then we have 



TS 


{l,m) G A 




HI 


{l,m) €a{-f{A)) 




H2 


Definition 3.2, the concretization function. 




Dl 


3C G 7(A) . (;, m) G C 


(DO, HI) 


D2 


VC G 7(A) : C C A 


(H2) 




{I, m) G A 


(Dl, D2) 



For the second inclusion assume that 7(A) ^ and let (/,m) G A; then we have 



TS 


{Urn) G a(7(^)) 




HI 


7(A) ^ 




H2 


(/, m) e A 




H3 


Lemma 3.23, concrete coverage. 




Dl 


3CG7(^) • {l,m)GC 


(H2, H3, HI) 




{l,m) G a{'j{A)) 


(DO, Dl) 



□ 
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3.4.4 Results of Correctness. We formalize the requirement of correctness of the 
abstract operations presented — the expression evaluation, the assignment and the 
filter operations — with the following theorems. 

Theorem 3.25. (Correctness of expression evaluation.) Let A G A and 

e G Expr; then 

eval(C, e) C eval(A, e). 

Theorem 3.26. (Correctness of the assignment.) Let A € A and a G 

Assignments; then 

ASSIGN (7(C), a) C 7(assign(A, a)). 
Theorem 3.27. (Correctness of the filter.) Let Ag A and c G Cond; then 
<l>{^{A),c)c^{ct>{A,c)). 

3.4.5 Proofs. We present some technical lemmas that will lead to the proof of 
the correctness theorems. 

Lemma 3.28. (Monotonicity of post.) Let A,B e A and I G C; then 

ACB ^ post(A,/) C post(S,0 

Proof. Let A,B e A such that AC B. Let l,m e C. 



TS 


m G POST(i?,;) 




HO 


m G P0ST(^, Z) 




HI 


ACB 




H2 


Definition 3.3, the post function. 




DO 


{l,m) G A 


(HO, H2) 


Dl 


{l,m) G B 


(DO, HI) 




m G post(S,/) 


(Dl, H2) 



□ 

Lemma 3.29. (Monotonicity of the extended post function 1.) 

Let A,B€iA and L C C; then 

ACB ^ post(A,L) C post(B,L). 
Proof. Let A,B g A such that AC B and let L C C. Let m G jC. 



TS 


m G POST(i3, L) 




HO 


m G POSt(A, L) 




HI 


ACB 




H2 


Definition 3.4, the extended post function. 




H3 


Lemma 3.28, monotonicity of the post function. 




DO 


31 G L .mc post(A, /) 


(HO, H2) 


Dl 


31 e L . rn, e post(B, /) 


(DO, HI, H3) 




/() G post(I?. L) 


(Dl, H2) 
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□ 

Lemma 3.30. (Monotonicity of the extended post function 2.) Let A G A 

and L,M C C; then 

L(^M => post(A,L) c post(A,M). 
Proof. Let A e ^ an let L C M C C. Let m G £. 



TS 


m e post(A, M) 




HO 


m e post(A,L) 




HI 


LCM 




H2 


Definition 3.4, the extended post function. 




DO 


31 € L .mG post{A,1) 


(HO, H2) 


Dl 


31 e M .mG post{A,1) 


(DO, HI) 




m e post(A,M) 


(Dl, H2) 



□ 

Lemma 3.31. (Monotonicity of the eval function.) Let A,B G A and e G 

Expr; then 

A^B eval(A, e) C EVAL(B,e). 

Proof. Let A,B G A such that A C B. We proceed inductively on the definition 
of the EVAL function. For the base case let I G C 



TS 


eval(A, /) c eval(S, /) 




HO 


Definition 3.6, the eval function. 




DO 


eval(A, I) = {/} 


(HO) 


Dl 


eval(B, I) = {1} 


(HO) 




eval{A, I) C eval(B, I) 


(Dl, DO) 



For the inductive case let e G Expr. 



TS 


eval(A, * e) C EVAL(i?, * e) 




HO 


ACB 




HI 


Definition 3.6, the eval function. 




H2 


Lemma 3.29, monotonicity of the ext. post 1. 




H3 


Lemma 3.30, monotonicity of the ext. post 2. 




H4 


eval(A, e) C EVAL(i?, e) 


(ind. hyp.) 


DO 


eval(A, * e) = post(A, eval(A, e)) 


(HI) 


Dl 


eval(S, * e) = post(B, eval(B, e)) 


(HI) 


D2 


POStM, EVAL(A, e)) C POSt(A, EVAL(i3, e)) 


(H4, H3) 


D3 


POStM, EVAL(i?,e)) C POST(B,EVAL{B,e)) 


(HO, H2) 


D4 


post(A, eval(A, e)) C post(B, EVAL(B,e)) 


(D2, D3) 




EVAL (A, * e) C eval(B, * e) 


(D4, Dl, DO) 



□ 
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Lemma 3.32. (Monotonicity of the extended eval 1.) Let A,B € A, e G 
Expr and i G N; then 

ACB EVAL{A,e,i)CEVAL{B,e,i). 

Proof. Let A,BeA such that ACB. We proceed inductively on the definition 
of the extended eval function. 



HO ACB 

HI Definition 3.14, the extended eval function. 



For the first base case let Z S £ and let i € N. 



TS 


EVAL (A, I, 


i + C eval(S,/ 


+ 




DO 


EVAL (A, I, 


i + = 




(HI) 


Dl 


eval(B,Z, 


+ = 




(HI) 


^« 


eval(A, /, 


i + 1) C eval(B,/ 


+ 


(DO, Dl) 



For the second base case let e G expressions. 



TS eval(^, e, 0) C eval(B, e, 0) 



H2 Lemma 3.31, the monotonicity of eval. 

^0 eval(A, e, 0) = eval(A, e) (HI) 

Dl EVAL(S,e,0) = EVAL(B,e) (HI) 

D2 eval(A, e) C eval(B, e) (HO, H2) 

^ eval(A, e, 0) C eval(B, e, 0) (D2, Dl, DO) 

For the inductive step let e e Expr and let i € N. 

TS eval(^, *e,i + 1) C eval(S, *e,i+ 1) 

H2 eval(^, e, i) C eval(B, e, i) (ind. hyp.) 

DO eval(A, * e, z + 1) = eval(A, e, i) (HI) 

Dl eval(S, *e,i+ 1) = EVAL(i3,e,i) (HI) 

D2 eval(A, e, i) C eval{B, e, i) (H2) 

*h EVAL(^,*e,i+ 1) C EVAL(B,*e,i + 1) (D2, Dl, DO) 

□ 

Lemma 3.33. (Eval cardinality on the abstract domain.) Let A £ A ana 

e e Expr; then 



^{A) 7^ ^ #EVAL(Ae) > 0. 
Proof. Let A e ^ such that ^{A) ^ 0. Let e e Expr. 
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TS 


#EVAL(A,e) > 




Tin 
rlU 


A\ -J- (h 




HI 


Definition 3.2, tiic concrctization function. 




H2 


Lemma 3.19, eval cardinality on the concrete domain. 




H3 


Lemma 3.31, monotonicity of the eval function. 




DO 


3C e C . C e 7(^) 


(HO) 


Dl 


3C G C . C C A 


(DO, HI) 


D2 


3C e C . eval(C, e) C eval (A, e) 


(Dl, H3) 


D3 


3C e C . # eval(C, e) < # eval(A, e) 


(D2) 


D4 


1 < #EVAL(A,e) 


(D3, H2) 




# eval (A, e) > 


(D4) 



□ 

Lemma 3.34. (Extended eval cardinality on the abstract domain.) Let 

A,BGA,eG Expr and i G N; then 

(j{A) 7^0aACBa# eval(B, e,i)>0) =^ # eval(A, e, i) > 0. 
Proof. Let A,BeA, let e e Expr and let i € N. 



TS 


# EVAL(i3, 


, e,i)>0 =^ # eval(A, e, z) > 


HO 


ACB 




HI 


liA) + 




H2 


Definition 


3.14, tlic extended eval function. 



We proceed by induction on i and on e (Definition 3.5). For the base case let i = 0. 



TS 


# ev4l(S, e, 0) > =^ # ev4l(A, e, 0) > 




H3 


Lemma 3.33, eval cardinality on the abstract domain. 




DO 


eval(A, e, 0) = eval(A, e) 


(H2) 


Dl 


# EVAL (A, e) > 


(HI, H3) 




#EVAL(B,e,0) > =^ #EVAL(A,e,0) > 


(Dl, DO) 


i > 0. 


For the second base case let e = I G £. 





TS 


# eval(S, I, 


i + 1) > =^ 


# eval(A, / 


,i + l) >0 




DO 


EyAL{B,l,i - 


M) = 






(H2) 


Dl 


#eval(B,Z, 


i + l) = 






(DO) 




#eval(B,Z, 


j + 1) > =^ 


# eval(A, / 


,i + l) >0 





For the inductive case let e = * / where / G Expr. 



TS 


# YXA1.{B. * /. / + 1) > =^ # fa-al(.1. * /, /: + 1 j >U 




H3 


# eval(B, /, {) > =^ # eval(A, f,i)>0 


(ind. hyp.) 


DO 


eyal{B, * f,i + l) = eval(S, /, i) 


(H2) 


Dl 


eval(A, * /, i + 1) = eval(A, /, i) 


(H2) 




# eval(B, * /, i + 1) > ^ # eval(A, * /, i + 1) > 


(H3, DO, Dl) 



□ 



54 • S. Soffia 



Proof. (Correctness of the expression evaluation, Theorem 3.25.) Let 

A €: A and let e G Expr. We distinguish two cases. First case: 7(^4) = then the 
thesis is trivially verified. For the second case, 7(A) ^ 0, let C S 'y{A). Then we 
have 



TS 


eval(C, e) C eval(A, e) 




HO 


C e jiA) 




HI 


Definition 3.2, tlio concrctization function. 




H2 


Lemma 3.31, monotonicity of the eval function. 




DO 


CCA 


(HO, HI) 




eval(C, e) C eval(^, e) 


(DO, H2) 



□ 

Lemma 3.35. (Effects of the assignment.) Let A G A, I € C and e, / S 

Expr; for convenience of notation let a € Assignments be such that a = (e, /) and 
E = eval(A, e). Then 

{POSt(A, I), ifl^E; 
eval(^, /), if E = {I}; 

post(A,OUeval(A,/), iflGEA#E>l. 

Proof. Let A e A, let (e, f)=a€ Assignments and let I G C We proceed case 
by case. These are our initial hypotheses. 



HO Definition 3.10, assignment definition. 
HI Definition 3.3, the post function. 

We consider separately the three cases of the lemma 

Z^EVAL(A,e); (CI) 

{0 = EVAL(A,e); (C2) 

I G eval(^, e) a # eval(A, e) > 1. (C3) 

First case. 



TS post(assign(^, a), l) = post(A, /) 

H2 Z^EVAL(Ae) (CI) 

To prove TS we prove the two inclusions. 

post(A, /) c post(assign(A, a), l) ; (Cl.l) 
post(assign(v4, a),Z) Cpost(A,Z). (C1.2) 

Let m G C. For the first sub-case we have 



TS 


m G post(assign(A, a),Z) 




H3 


m G post(A, I) 


(Cl.l) 


DO 


{l,m) e A 


(H3, HI) 


Dl 


{l,m) ^ EVAl(A, e) X £ 


(H2) 


D3 


{l.m) G assign(^,o) 


(DO, Dl, HO) 




G post(assign(-1. (/). /) 


(D3. HI) 
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For the second sub-case: 



TS 


m e post{A,1) 




H3 


m e post(assign(A, a), I) 


(C1.2) 


DO 


{I, m) e assign(A, a) 


(H3, HI) 


Dl 


{l,m) eval(A, e) x eval(^, /) 


(H2) 


D2 


(/,m) € A 


(DO, Dl, HO) 




m e post(^, Z) 


(D2, HI) 



For the second and third cases we have to prove an intermediate result. 

I e eval(A, e) => EYAL{A,f) c post(assign(A, a), 
Let m e C 



TS 


m G POST(ASSIGN(A,a),Z) 




H2 


I e eval(A, e) 


(C2) 


H3 


m G eval(A, /) 


(C2) 


DO 


eval(A, e) X eval(A, /) C assign(A, a) 


(HO) 


Dl 


{l,rn) G eval(A, e) x eval(A, /) 


(H2, H3) 


D2 


{l,m) G ASSIGn(A, a) 


(Dl, DO) 




m G POST(ASSIGN(^,a),Z) 


(D2, HI) 



Note that for both the second and third case wc assume that I G EVAl(A. e) thus 
# E'VAl{A, e) > so we will check only the cases # EVAl(A, e) = 1 (2nd case) and 
#EVAl(A, e) > 1 (3rd case). Now the second case. 



TS 


post(assign(A, a), Z) =eval(A, /) 




H2 


/ G eval(A, e) 




H3 


# eval(A, e) = 1 




DO 


assign(^, o) = {1} X eval(^, f)U (^A \ {{1} X £) j 


(H3, H2, HO) 



Also in this case, to prove TS we prove the two inclusions 



post(assign(A, a),Z) Ceval(^, /); (C2.1) 
post(assign(A, a),Z) Deval(^, /). (C2.2) 

One inclusion (C2.2) comes by modus ponens by applying the hypothesis H2 to the 
previous intermediate result. For the other inclusion (C2.1), let m € C; then we 
have 



TS 


m G eval(A, /) 




H4 


TO G post(assign(A, a), I) 


(C2.1) 


Dl 


{I, m) e assign(A, a) 


(H4, HI) 


D2 


il,m)^A\{{l}xC) 




D3 


{l,m) G {1} X eval(A, /) 


(D2, Dl, DO) 




TO G eval(v1, /) 


(D3) 



Now the third case. 
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TS 


post(assign(A, a), = 


post{A, I) U eval(A, /) 




H2 


I e EVAL(yl, e) 




(C3) 


H3 


#EVAL(A,e) > 1 




(C3) 


DO 


assign(A, a) = eyal{A, 


e) X eval(A,/) U a 


(H3, H2, HO) 



Again, we prove separately the two inclusions. 

post(assign(A, a), I) c post(A, I) U eval(A, /); 
post(assign(A, a), l) 3 posT(yl, I) U eval(A, /). 

For the inclusion (C3.2), applying the modus ponens 
the above intermediate result we have that 

eval(^, /) C post(assign(^, a),Z). 

For the other part 



TS post(assign(A, g),?) D post(^, 

H4 Lemma 3.31, monotonicity of eval. 

m ASSIGN(A,a) D A (DO) 

>h post(assign(A, a),/) D post(^, i) (Dl, H4) 



For the remaining inclusion (C3.1), let m e £ so that m e post(assign(^, a), /) . 
We need to show that m <E EVAl{A, /) U POSt(A, Z) too: to do this we show that 
m ^ eval(^, /) m € post(^, I). 



TS mgPOST(A,0 

Hi m ^ EVAL (A, /) (C3l) 

H5 me post(assign(^, a), l) (C3.1) 

Dl (/,m) ^ EVAL(A,e) X eval(A,/) (H4) 

D2 ll,m) e assign(A(i) (H5, HI) 

D3 {l,m)eA (D2, Dl, DO) 

^< mePOST(^,Z) (D3, HI) 



□ 

Lemma 3.36. (Monotonicity of the assignment.) Let A,B & A and a G 

Assignments; then 

[AC B A ^{A) ^ 0) assign(^, a) C assign(S, a). 

Proof. Let A,B g Ahe such that A C B and 7(A) ^ Let (e,/) = a e 
Assignments. We have to prove that ASSIGn(A, a) C assign(_B, a). Then let /, m S 
£ be such that {l,m) € ASSIGn(A, a). To prove this lemma we have to prove that 
{l,m) e assign(B,o) too. Thus we have 



(C3.1) 
(C3.2) 

to the hypothesis H2 and to 
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TS (/, m) € ASSIGn(.B, g) 



HO ACB 

HI 7(A) 

H2 (/,to) e assign(74, o) 

H3 Lemma 3.35, effects of the assignment. 

H4 Lemma 3.31, monotonicity of eval. 

H5 Definition 3.3, the post function. 

H6 Lemma 3.29, monotonicity of the extended post 1. 

^0 eval(A, e) C eval(B, e) (HO, H4) 

Dl eval(A, /) ^ eval(B, /) (HO, H4) 

D2 post(A,0 C post(B,0 (HO, H6) 

We distinguish two cases. 

I eEVAL{A,e); (CI) 
l^EYAL{A,e). (C2) 

For the first case. 



H7 


I e EVAL (A, e) 


(CI) 


D3 


I e eval(B, e) 


(H7, DO) 


D4 


m e post(assign(A, a), 


(H2, H5) 


D5 


EVAL{B,f) C POST(ASSIGN(i?, a), 


(D3, H3) 



Note that from H7 follows that # EyAL{A, e) > 1 and now we distinguish the two 
sub-cases 

#EVAL(A,e) = 1; (Cl.l) 
#EVAL(A,e) > 1; (CI. 2) 

which cover all the possibilities. Now the first sub-case. 



H8 


# eval{A, e) = 1 


(Cl.l) 


D6 


post(assign(A, o), Z) = eval(A, /) 


(H7, H8, H3) 


D7 


m e eval(A, /) 


(D4, D6) 


D8 


m e EVAL(i3, /) 


(D7, Dl) 


D9 


m e POST(ASSIGN(B,a),/) 


(D8, D5) 




{l,m) e ASSiGN(B,a) 


(D9, H5) 



For the other sub-case 



H8 


#E\AL(A,e) > 1 


(CI. 2) 


D6 


post(assign(A, o), /) = eval(A, /) u post(74, /) 


(H7, H8, H3) 


D7 


#EVAL(i?,a) > 1 


(H8, DO) 


D8 


POST(ASSIGN(i?, a),l) = EVAL(i?, /) U POST(i?, /) 


(D3, D7, H3) 


D9 


POSt(aSSIGn(74, a),;) C POST(ASSIGN(i?, a), 


(D6, D8, DO, D2) 


DIO 


m G POST(ASSIGN(i?, e), l) 


(D4, D9) 




{l,m) e ASSiGN(B,a) 


(DIO, H5) 



This completes the first case (CI). Now the second case (C2). 
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H7 


/ ^ eval(A, e) 


(C2) 


D3 


post(assign(A, a), Z) =post{AJ) 


(H7, H3) 


D4 


m e post(assign(A, a), /) 


(H2, H5) 


D5 


m e post(A, I) 


(D4, D3) 


D6 


m e POST(i?, I) 


(D5, D2) 


Also in the second 


case we distinguish two sub-cases. 




I ^ EVAL(i?, e); 






I e EVAL(i?, e). 






Now the first sub-case. 




H8 


I ^ EVAL(i3, e) 


(C2.1) 


D7 


POST(ASSIGN(i3, a), = P0St{B,1) 


(H8, H3) 


D8 


m G P0ST(ASSIGN(i3, a), 


(D6, D7) 




(/, m) e ASSIGN(i3, a) 


(D8, H5) 



(C2.1) 
(C2.2) 



Now the other second sub-case 



H8 
H9 


I € eval{B, e) 

Lemma 3.33, eval cardinaHty on the abstract domain. 


(C2.2) 


D7 


#eval(A, e) > 


(HI, H9) 


D8 


# eval(A, e) < # EVAL(i3, e) 


(DO) 


D9 


# eval(A, e) < # EVAL(i3, e) 


(D8, H7, H8) 


DIO 


#EVAL(i?,e) > 1 


(D9, D7) 


Dll 


B C ASSIGN(i?, a) 


(DIO, H3) 


D12 


(/,m) e B 


(D6, H5) 




(1, m) e assign(B, a) 


(D12, Dll) 



□ 

It is worth stressing that 7(v4) ^ is a necessary hypothesis of Lemma 3.36. 
Consider indeed the following example: £ = {l,m,n} and A,B & A such that 
A — {(m, n)} and B — {(Z,to), (to, n)}. We have obviously that A C B. Consider 
what happens to the arc (rn,n) when the assignment a = is performed: 

eval{A,*1) = while eval(B,*^) = {m} resulting in ASSiGN(y4, a) = A and 
ASSiGN(i3, a) = {(/, m), (m, I)}. Thus ASSIGn(A, a) % ASSiGN(i?, a). 



Q 1 























1 Q 



The abstraction A = ASSlGN(A,a). 



The abstraction B. 



The abstraction ASSiGN(i3, a). 
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Proof. (Correctness of the assignment, Theorem 3.26.) Let A G A, let 

C e 'y{A) and let a G Assignments. 



HO C e 7(A) 

HI Definition 3.2, the concretization function. 
H2 Lemma 3.36, monotonicity of the assignment. 



DO 


CCA 


(HO, HI) 


Dl 


C e 7(C) 


(HI) 


D2 


7(C) ^ 


(Dl) 


D3 


assign(C, a) C ASSiGN(A,a) 


(DO, D2, H2) 


D4 


assign(C, a) e 7(assign(A, a)) 


(D3) 



To proceed in the proof af the correctness of the filter abstract operation, now 
we reformulate all the previous lemmas on the post function on the prev function. 

Definition 3.37. (Transposed abstract domain.) Let 

TRAN : A 

be defined, for all A G A, as 

tran(A) = { (m, Z) I {l,m) G A}. 

Lemma 3.38. (Transpose is idempotent.) Let AG A, then 

tran(tran(A)) = A. 

Proof. This result can be easily derived from the definition of the transpose 
function (Definition 3.37). □ 

Lemma 3.39. (Duality of prev and post.) Let A G A and I G C; then 

post(A, Z) = prev(tran(A), Z); 
post(tran(A), Z) = prev(A, Z). 

Proof. Let Ag A and let / e C. 

TS post(A,Z) = prev(tran(A),/) 

HO Definition 3.3, the prev function. 

HI Definition 3.3, the post function. 

H2 Definition 3.37, transposed abstract domain. 

We proceed by prooving separately the two inclusions 

post(A, /) C prev(tran(A),Z); (CI) 
post(A,/) D prev(tran(A),Z). (C2) 

Let m G C For the first inclusion (CI) 
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TS m G prev(tran(^), l) 

m g post(A,0 (CI) 

DO {l,m)eA (H3, HI) 

Dl (m,Z) e TRAN(A) (DO, H2) 

m G prev(tran(A), Z) (Dl, HO) 

For the second inclusion (C2) 

TS TO e post(^, 

^3 /)t £ PREA-(TRAN(;yl)./) (C2) 

DO (to, e tran(^) (H3, ho) 

Dl {l,m)GA (DO, H2) 

meposT{A,l) (Dl, HI) 

The other half of this lemma can be proved observing that the transpose function 
is idempotent and applying this result to the first part of the lemma. 

TS POSt(tRAN(A), ?) = PREV(A, ?) 

HO post(A, Z) = prev(tran(74), 

HI Lemma 3.38, transpose is idempotent. 

DO PREV(A, Z) = PREV^TRAN(TRAN(^)),Zj (HI) 

Dl prev^tran(tran(A)),Z^ = post(tran(A),/) (HO) 

POSt(tRAN(A),Z) = PREV(A,Z) (Dl, DO) 

□ 

Lemma 3.40. (Monotonicity of prev.) Let A,BgA and I G jC; then 

ACB ^ PREV(A,Z) C PREV(S,Z). 

Proof. Let A,B gA such that ACE and let I e C. 



TS 


PREV(A, I) C PREV(B, I) 




HO 


AC B 




HI 


Lemma 3.39, the duality of prev and post. 




H2 


Definition 3.37, transposed abstract domain. 




H3 


Lemma 3.28, the monotonicity of post. 




DO 


PREv{A,l) = post(tran(A), Z) 


(HI) 


Dl 


PREV(B, I) = POSt(tRAN(B), l) 


(HI) 


D2 


TRAn(A) C TRAN(i3) 


(H2, HO) 


D3 


POSt(tRAN(A), Z) C POST(TRAN(i?), 


(D2, H3) 




PREV(A, /) C PREV(B,Z) 


(D3, Dl, DO) 



□ 

Lemma 3.41. (Duality of extended prev and post.) Let A G A and L C C; 

then 

P0ST(A, L) = PREv(TRAN(A),i); 
POST(TRAN(^),i) = PREV(^, i). 
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Proof. This result comes easily from the definition of the extended prev and 
post functions (Definition 3.4) applying the result of duality of prev and post 
(Lemma 3.39). □ 

Lemma 3.42. (Monotonicity of the extended prev 1.) Let A,B S A and 

L C £; then 

ACB ^ PREV(A,i) C PREV(B,i). 



Proof. Let A,B e A such that ACE and let L C C. 



TS 


PREV(A, L) C PREV(i?,L) 




HO 


ACB 




HI 


Lemma 3.41, the duality of extended prev and post. 




H2 


Definition 3.37, transposed abstract domain. 




H3 


Lemma 3.29, the monotonicity of extended post 1. 




DO 


PREv(A, L) = post(tran(^), i) 


(HI) 


Dl 


PREV{B,l) = POST(TRAN(i?),L) 


(HI) 


D2 


TRAN(A) C TRAN(i3) 


(H2, HO) 


D3 


POSt(tRAn(A), L) C POST(TRAN(B),i) 


(D2, H3) 


<^ 


PREV(A,L) C PREV(i3,L) 


(D3, Dl, DO) 



□ 



Lemma 3.43. (Monotonicity of the extended prev 2.) Let A ^ A and 

L,M C jC; then 

LCM PREv(A,L) C prev(^,M). 



Proof. Let ^ G ^ and let i C M C £. 



TS 


PREV(A, L) C prev(^,M) 




HO 


LCM 




HI 


Lemma 3.41, the duality of extended prev and post. 




H2 


Lemma 3.30, the monotonicity of extended post 2. 




DO 


PREv(A, L) = post(tran(A), 


(HI) 


Dl 


PREv(A, Z) = post(tran(A),M) 


(HI) 


D2 


post(tran(A), L) C post(tran(A),M) 


(HO, H2) 




PREV(A, L) C prev(B,M) 


(D2, Dl, DO) 



□ 



Lemma 3.44. (Location closure.) Let A G A and I € £; then 
post(A, /)7^0 Z G PREv(^, post(^, 0)- 

Proof. Let A e A and let I & C such that post{A,1) ^ 0. Let then m G 
post(A, /). 
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TS 


I e PREV(A, post(A, I)) 




HO 


m e post(A, I) 




HI 


Lemma 3.43, monotonicity of extended prev 2. 




H2 


Definition 3.3, the prev function. 




H3 


Definition 3.3, the post function. 




DO 


PREV(A, {m}) C prev(j4, post(A, Z)) 


(HO, HI) 


Dl 


{l,m) e A 


(HO, H3) 


D2 


I e prey(A, {m}) 


(Dl, H2) 




I G PREY (A, POST {A, I)) 


(D2, DO) 



□ 

Lemma 3.45. (Extended location closure.) Let A G A and L C jC; then 

7(^)^0 L C PREV(A, post(A,L)). 

Proof. Let A e A such that j{L) 7^ and let L C £. If i = then the thesis 
is trivially verified. Otherwise, let / G L. 

TS I € PREv(^, post(A, L)) 

HO l€L 

HI 7(A) ^ 

H2 Definition 3.2, concretization function. 

H3 Lemma 3.44, location closure. 

H4 Lemma 3.30, monotonicity of post 2. 

H5 Lemma 3.43, monotonicity of prev 2. 



DO post(A,0 7^ (HI, H2) 

Dl / e PREVIA post(A, {«})) (DO, H3) 

D2 post(A,{Z}) C post(A,L) (HO, H4) 

D3 prey(a,post{A,{1})^ Cprev{A,post{A,L)) (D2, H5) 

^ I G prev{A,post{A,L)) (Dl, D3) 



□ 

Lemma 3.46. (Monotonicity of extended eval 3.) Let A G A, e G Expr 

and i G N; then 

■y{A) 7^ eval(A, e,i + 1) C prev(A, eval(A, e, z)). 

Proof. Let Ag A such that j{A) ^ 0, let e G Expr and let z G N. We proceed 
by induction on the definition of the extended eval function. 

TS eval(A, e, z + 1) C prev(A, eval(A, e, i)) 
HO Definition 3.14, the extended eval function. 
HI 7(A) ^ 

For every i let e = I G £. 



TS 


EVAL (A, 


h 1) C PREV(A, eval(A, I, i)] 


1 


DO 


eval(v4, L i H 


hi) =0 


(HO) 




EVAL(A. /. /■ H 


- 1) C PREV(-1. EVAL(-1. /. i)] 


1 (DO) 
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Let e = * / with / G Expr. For i = 



TS eval(A, * /, 1) C prev(A, eval{A, * /, 0)) 

H2 Definition 3.6, the eval function. 

H3 Lemma 3.45, the extended location closure. 

DO eval(A, */,!) = eval(A, /) 

Dl eval(A, * /, 0) = eval(v4, * /) 

D2 eval(A, * /) = post(^, eval(A, /)) 

D3 eval(A, * /, 0) = post(^, eval(^, /)) 

D4 eval(A, /) C PREv(^^, POST(yl, eval(A, /)) 

D5 EVAL(A, /) C PREV(^, E\'.AL(A, * /, 0)) 

>h eval(A, * /, 1) c prev(A, eval(A, * /, 0)) 



For i > for convenience of notation let i = j + 1. 



TS 


R\AL(.4. * f.j + 2) C PREV(A, E\AT,(^, * /. ;/ + 1)) 




H2 


fa-al(.1. + 1) C PREV(.1,FA-AL(;.1. 


(ind. livp.) 


DO 


eval(A, *f,j + 2) = eval(A, f,j + 1) 


(HO) 


Dl 


eval(^, * /, j + 1) = eval(A, f,j) 


(HO) 




EVAL(^, * f,j + 2) C PREV(A, eval(A, *f,j + 1)) 


(Dl, DO, H2) 



□ 

Lemma 3.47. (Monotonicity of extended eval 3b.) Let A € A, e € Expr 
and i G N; then 

7(^)7^0 post(A, eval(A, e,i + 1)) C eval(A, e,i). 

Proof. Let Ag A such that 7(A) ^ 0, let e e Expr and let i gN. We proceed 
again by induction on the definition of the extended eval function. 



TS post(A, eval(A, e, i + 1)) c eval{A, e, i) 
HO Definition 3.14, the extended eval function. 
HI 7(A) 7^ 



For every i let e = I G C 



TS 


post(A, eval(A, I, i + C eval(A, /, i) 




H2 


Definition 3.3, post function. 




DO 


eval(A, i + 1) = 


(HO) 


Dl 


postM, eval(A, I. i + 1)) = 


(DO, H2) 




post(A, eval(A, I, i+l)) C eval(A, I, i) 


(Dl) 



(HO) 
(HO) 
(H2) 

(Dl, D2) 

(H3, HI) 

(D3, D4) 
(D5, DO) 



Let e = * / with / G Expr. For i = 
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TS 


post(A, eval(A, * /, 1)) c eval(A, * /, 0) 




H2 


Definition 3.6, the eval function. 




H3 


Lemma 3.45, the extended location closure. 




DO 


eval{A, */,!) = eval(A, /) 


(HO) 


Dl 


posTf^, eval(A, * /, 1)^ = post(A, eval(A, /)) 


(DO) 


D2 


postM, eval(v4, * /, 1)) = eval(A, * /) 


(Dl, H2) 




post(^, eval(A, * /, 1)) = eval(A, * /, 0) 


(D2, HO) 



For i > for convenience of notation let i = j + 1. 



TS 


post(A, eval(A, * f,j + 2)) C eval(A, * f,j + 1) 




H2 


post(^, eval(A, f,j + 1)) c eval(A, f,j) 


(ind. hyp.) 


DO 


eval(A, * /, J + 2) = eval(A, /, J + 1) 


(HO) 


Dl 


eval(A, * /, j + 1) = eval(A, /, j) 


(HO) 




post(A, eval(A, */, j + 2)) c eval(^, * f,j + 1) 


(Dl, DO, H2) 



□ 



Lemma 3.48. (Monotonicity of target.) Let A,B G A, e € Expr, L C £ and 
i,j G N; then 



{ACBAi<jA 7(A) ^ 0) =^ 

(EVAL{A,e,i) C TARG{B,L,e,i) eval(A, e, j) C TARG(i3, L, e, j)) 



Proof. Note that if i = j then the consequent of the implication in the above 
statement is always true thus the thesis is trivially verified. For the case i < j we 
will prove that 



(A C B A 7(A) 7^ 0) =^ 

(eval(A, e, i) C targ(B, L, e, i) =^ eval(A, e, C targ(B, L, e, 



as this implies, by a trivial induction on i, the original result. Let A, B G A such 
that A C B and 7(A) 7^ 0, let i G N, e e Expr and L C C. 
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TS EVAL(A,e,i + 1) C TARG(-B,L,e,i+ 1) 

HO ACB 

HI eval(A, e, i) C targ(B, L, e, i) 

H2 7(A) ^ 

H3 Definition 3.15, the target function. 

H4 Lemma 3.46, monotonicity of the extended eval 3. 

H5 Lemma 3.42, monotonicity of the extended prev 1. 

H6 Lemma 3.43, monotonicity of the extended prev 2. 

H7 Lemma 3.32, monotonicity of the extended eval 1. 



DO 


targ{B, L,e,i + 1) 








= EVAL(i3, e, i + 1) n PREV(-B, TARG(i?, L, e, 




(H3) 


Dl 


EVAL(A, e,i + l) C eval(B, e, i + 1) 




(H7) 


D2 


eval(A, e,i + l) C PREvf a, eval(A, e, i)) 




(H2, H4) 


D3 


eval(A, e, i + 1) C PREvfi?, eval(A, e, i)) 




(D2, HO, H5) 


D4 


eval(A, e, z + 1) C prev(B, targ(B, L, e, i)) 




(D3, HI, H6) 


D5 


eval(A, e,i + l) 








C EVAL(i3, e, i + 1) n PREv(B, TARG(i?, L, e, 




(Dl, D4) 




eval(A, e, i + 1) C targ(S, L,e,i + 1) 




(D5, DO) 



□ 



Lemma 3.49. (Generalized correctness of filter 2.) Let A,BgA,LCC 
and e G Expr; then 

{ACBA j{A) ^ A eval(A, e)CL) ^ AC (j){B, L, e). 

Proof. Let A,BGA,\etLCjC and let e e Expr. We distinguish two cases 

e = leC; (CI) 
e = * / G Expr \ C. (C2) 

For the first case (CI) let I G £. We have 



TS 


A C (j){B,L,l) 




HO 


Definition 3.17, filter 2. 




HI 


eval(A, l)CL 




H2 


Definition 3.6, the eval function. 




H3 


ACB 




DO 


eval(A, I) = {/} 


(H2) 


Dl 


I G L 


(DO, HI) 


D2 


eval(B, = {/} 


(H2) 


D3 


eval(B, l)CL 


(Dl, D2) 


D4 


(j>{B,L,l) = B 


(HO, D3) 




ACct){B,L,l) 


(D4, H3) 



Now the second case (C2). 
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TS 


A C (f){B,L,e) 




HO 


eval(A, e) C L 




HI 


ACB 




H2 


7(A) ^ 




H3 


Definition 3.17, filter 2. 




H4 


Definition 3.16, filter 1. 




TO 




(TS, H3) 


Tl 


Vi e N : A C 4,{B,L,e,i) 


(TO) 


T2 


V(/, m) e A : Vi e N : (/, m) G i, e, i) 


(Tl) 



To prove TS we will prove the equivalent result T2. Let {I, m) & A and let i € N. 

TS {l,m) e <p{B,L,e, i) 
H5 {l,m)GA 

We proceed by induction on i. For the base case let i = 0. 



TS 


{l,m) e 4>{B,L,e,Q) 




DO 


cj){B,L,e,G)=B 


(H4) 


Dl 


{l,m) e B 


(H5, HI) 




(«,m) e <?i(S,L,e,0) 


(Dl, DO) 



For the inductive case let i > 0. For convenience of notation let J G N such that 
i=j + l 

TS {l,m) €<j){B,L,e,j + 1) 

H6 {I, m) g (j){B, L, e, j) (hyp, ind.) 

We distinguish two cases depending on the cardinality of the target set 

#TARG(B,L,e, j + 1) 7^ 1; 
#TARG(B,L,e, j + 1) = 1. 

For the first case (C2.1), we have 



H7 


#TARG(i?,i,e, j 


+ 1)7^1 


(C2.1) 


DO 


0(B,L,e,j + l) = 


= (t){B,L,e,j) 


(H7, H4) 




(1, m) G 0(5, L, e, 


+ 


(DO, H6) 



For the second case (C2.2), assume that 



H7 #targ(S,L, e, j - 


f 1) = 1 


(C2.2) 


DO (A(B,L,e,i + l) = 


(piB,L,e,j) 




\^targ(S,L, e, 


J + l) X (£\TARG(B,L,e,j)) 


) (H7, H4) 



We distinguish two sub-cases 

Z ^TARG(B,L,e,j + l); (C2.2.1) 
ZGTARG(B,L,e,j + l). (C2.2.2) 

For the first sub-case (C2.2.1) assume that 



(C2.1) 
(C2.2) 
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H8 


I ^ TARG{B,L,e,j + 1) 


(C2.2.1) 


Dl 


(Z, m) ^ targ(S, L,e,j + 1) X (C\ targ(B, L, e, j)) (H8) 




(l.m) e 4>iB,L,e,j + l) 






(Dl, DO, H6) 




the second sub-case (C2.2.2) we have 


H8 


I e TARG(i?, L, e.j + 1) 


(C2.2.2) 


H9 


Lemma 3.31, monotonicity of eval. 




HIO 


Definition 3.15, the target function. 




Hll 


Definition 3.14, the extended eval function. 




H12 


Lemma 3.48, monotonicity of target. 




H13 


Lemma 3.34, ext. eval cardinality. 




H14 


Definition 3.3, the post function. 




H15 


Lemma 3.47, monotonicity of ext. eval 3b. 




Dl 


EVAL (A, e) C EVAL(i3,e) 


fHl H91 


D2 


eval(A, e) C L n EVAL(i?, e) 


(Dl, HO) 


D3 


TARG(i?, L, e, 0) = L n EVAL(i3, e) 




D4 


EVAL(y4., e) C targ( B, L, e, 0) 


(D2, D3) 


D5 


EVAL(A, e, 0) C TARG(i?, -L, e, 0) 


(D4, Hll) 


D6 


evatY a e i -\-W a targT B L e i -1- 1^ 


fD5 HI H2 H121 


D7 


eval(A, e,j) C targ(B, Z/, e, j) 


(D5, HI, H2, H12) 


D8 


TARGfi?, L, e, 7 -1- 1) = EVAL(i3, L, 6, 7 + 1) 






n PREv(i3, TARG(i?, L, e, j)) 


(HIO) 


D9 


TARG(B,Z/,e, j -1- 1) C eval(B,X, e,j -1- 1) 


(D8) 


DIO 


I e EVAL(i3, L, e, j + 1) 


(H8, D9) 


Dll 


#EVAL(B,L,e, j -M) > 


(DIO) 


D12 


#EVAL(^,i,e,i + 1) > 


(H13, HI, H2, Dll) 


D13 


{1} = T:AnG{B,L,e,j + 1) 


(H7, H8) 


D14 


{Z} = EVAL(A,e,j-hl) 


(D13, D12, D6) 


D15 


m e post(A, 


(H5, H14) 


D16 


rn e post(A, eval(v4, e.j + 1)) 


(D15, D14) 


D17 


post(A, eval(A, e, j -1- 1)) c eval(A, e,j) 


(H2, H15) 


D18 


m e eval(^, e, j) 


(D16, D17) 


D19 


m e TARG(i?, i, e, j) 


(D7, D18) 


D20 


C\ targ(S, L, e, j) 


(D19) 


D21 


to) ^ targ(B, Z/, e, j + 1) 






x(/:\TARG(B,i,e,j)) 


(D20) 




(/,to) e0(S,L,e,j + l) 


(D21, H6, DO) 



Lemma 3.50. (Correctness of filter 2.) Let A G A, L C C and e G Expr; 

then 

VC e 7(A) : ea'.Al(C, e) C L C S -f{<l){A, L, e)). 

Proof. This is a simple corohary of Lemma 3.49. Let A e A and let C £ 7(A). 
Let e G Expr and let L C £ such that EVAl(C, e) C L. 
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1 o 


o fc 1^, e)) 




HO 


eval(C, e) C L 




HI 


C G 7(^) 




H2 


Lemma 3.49, generalized correctness of filter. 




H3 


Definition 3.2, concretization function. 




DO 


C G 7(C) 


(H3) 


Dl 


7(C) ^ 


(DO) 


D2 


C^A 


(HI, H3) 


D3 


C C (j){A,L,e) 


(D2, Dl, HO, H2) 


^< 


CG7(</'(Ai,e)) 


(D3, H3) 



□ 

Lemma 3.51. (Equality target.) Let A G A ande,f G Expr. For convenience 
of notation let c G Cond be such that c = (eq, e, /). Finally, let C G ^(7(A),c). 
Then 

eval(C, e) C eval(^, e) n eval(^, /). 

Proof. Let A & A, let (eq, e, /) = c G Cond and let C G C such that C G 
(j){^{A)^c). Note that from the definition of the concrete semantics of the filter 
operation (Definition 3.13) we have 

0(7(A), c) = ^{A) n modelset(c). 

Thus, C G modelset(c) and C G 7(A). 



TS 


eval(C, e) C ¥NAh{A, e) n eval(A, /) 




HO 


C G modelset(c) 




HI 


C G 7(^) 




H2 


Definition 3.9, value of conditions. 




H3 


Definition 3.2, concretization function. 




H4 


Lemma 3.31, monotonicity of the eval function. 




DO 


Che 


(HO) 


Dl 


eval(C, e) = eval(C, /) 


(H2, DO) 


D2 


CCA 


(HI, H3) 


D3 


eval(C, e) C eval(A, e) 


(D2, H4) 


D4 


eval(C, /) c eval(A, /) 


(D2, H4) 


D5 


eval(C, e) C eval(A, /) 


(D4, Dl) 


>h 


eval(C, e) C eval(A, e) n eval(A, /) 


(D4, D5) 



□ 

Lemma 3.52. (Inequality target.) Let A G A and e, f <E Expr. For conve- 
nience of notation let c G Cond be such that c = (neq, e, /). Let C G (/)(7(A),c) 
and let 

I = eval (A, e) n EVAL (A, /), 

E = eval(A, e) \ eval(A, /), 
F = eval(A, /) \ eval{A, e). 
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Then 

# / = 1 =^ e\:4l(C, e)CE\' eval(C, /) C F. 



Proof. Let A e A, let (neq, e,f) = ce Cond and let C € C. To show the thesis 
we assume that #1 = 1 and EVAL((7e) ^ E and then we show that eval((7, /) C F. 



-L kj 


T7^/A T f\ TTA^A T (A f\\ T7\/A T ( A f>\ 
rjVi\ijl J J ^ rjV/Aljl^, J / \ i^VAljl-rl, o I 




Tin 

xlU 


o e 'y(A.) 




ni 






149 


44- ( V\rA T ( A C^ n T7\?A T ( A fW 1 

ff- 1 EVAL\^yi, 1 1 E VAU^yi, J M — i 




xlO 


^ 1 EVAH^O, 6j ^ EVAH^-;4., 6J \ EVAH^yi, / J 1 




TI/I 


Definition 3.2, the concretization function. 




xlO 


Lemma 3.31, monotonicity of eval. 




±10 


Definition 3.9, value of conditions. 




TI7 

XI ( 


Lemma 3.19, eval cardinality on C 




DO 


-i(eval(C, e) C EVAL(y4., e) A eval(C, e) % EVAL(y4., /)] 




Dl 


eval(C, e) % eval(A, e) V eval(C, e) C eval(A, /) 


(DO) 


D2 


CQA 


(HO, H4) 


D3 


eval(C, e) C eval(A, e) 


(D2, H5) 


D4 


eval(C, e) C eval(A, e) A eval(C, e) c eval(A, /) 


(D3, Dl) 


D5 


eval(C, e) C EVAL (A, e) n eval (A, /) 


(D4) 


D6 


# eval(C, e) = 1 


(H7) 


D7 


eval(C, e) = EVAL (A, e) n eval (A, /) 


(H2, D6, D5) 


D8 


eval(C, e) 7^ eval(C, /) 


(HI, H6) 


D9 


eval(C, /) ^ eval(A, e) n eval(A, /) 


(D6, D5, H2) 


DIO 


#eval(C,/) = 1 


(H7) 


Dll 


eval(C, /) % eval(A, e) n eval(A, /) 


(DIO, D9) 


D12 


eval(C, /) ^ eval(A, e) V eval(C, /) % eval(^, /) 


(Dll) 


D13 


eval(C, /) C eval(A, /) 


(D2, H5) 


D14 


eval(C, /) C eval(A, /) A ev\l(C, /) ^ eval(A, e) 


(D13, D12) 




eval(C, /) c eval(A, /) \ eval(A, e) 


(D14) 



□ 



Proof. (Correctness of the filter, Theorem 3.27.) Let ^ G ^, let c e Cond 
and let C G C. For convenience of notation let / = EVAl(A, e) fl EVAl(A, /). 

TS Cg7('/'(Ac)) 

HO C\=c 

HI C e 7(A) 

H2 Lemma 3.50, correctness of filter 2. 

H3 Definition 3.18, filter 3. 

H4 Definition 3.2, concretization function. 

We distinguish two cases 

c= (eq,e,/); 
c= (neq,e,/). 



(CI) 
(C2) 
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For the first case (CI) let c = (cq, e, /). 



H5 


Lemma 3.51, the equahty target. 




DO 


eval(C, e) C I 


(H5, HI, HO) 


Dl 


EVALiC, f)CI 


(H5, HI, HO) 


D2 


CejUiA,I,e)) 


(DO, HI, H2) 


D3 




(Dl, HI, H2) 


D4 


C C (j){A,I, e) 


(D2, H4) 


D5 


Cc^(A,/,/) 


(D3, H4) 


D6 


cj){A,c)^cj){A,I,e)n<P{A,I,f) 


(H3) 


D7 


CCq^iA,I,e)n<l>{A,I,f) 


(D5, D4) 


D8 


C C (l){A,c) 


(D7, D6) 


>^ 




(D8, H4) 



For the second case (C2) let c = (neq, e, /). We distinguish two sub-cases. 

#1^1; (C2.1) 

#1 = 1. (C2.2) 
For the first sub-case (C2.1) 



H5 




(C2.1) 


DO 


4>{A,c) = A 


(H3, H5) 




CG7(<^(Ac)) 


(DO, HI) 



In the second sub-case (C2.2) for convenience of notation let E,F <Z Che defined 
as 



E = eval(A, e) \ eval(A, /), 
F = eval(A, /) \ eval(A, e). 



H5 




(C2.1) 


H6 


Lemma 3.52, the inequality target. 




DO 


cl>{A,c)=cf>{A,E,e)l}ct>{A.FJ) 


(H3, H5) 


Dl 


eval(C, e) c £; V eval(C, /) C F 


(H6, H5, HI, HO) 


D2 


EVAL(C,e)C£; =^ C Cz-i{(t>{A,E,e)) 


(H2, HI) 


D3 


EVAL(C, /) C F ^ C e ^{4>{A, F, /)) 


(H2, HI) 


D4 


C ej{c^{A,E,e))vC ej{<ij{A,F,f)) 


(Dl, D2, D3) 


D5 


CCcl,{A,E,e)\/CQc^{A,F,f) 


(D4, H4) 


D6 


CCcj>{A,E,e)Uct>{A,F,f) 


(D5) 


D7 


C C (j){A,c) 


(D6, DO) 




Ce7(</'(Ac)) 


(D7, H4) 



□ 

4. PRECISION LIMITS 

This section presents some considerations about the precision of the analysis; start- 
ing from questions that regard the points-to representation, that is, common to all 
points-to methods; to questions about the specific method presented. 
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4.1 Precision of the Points- To Representation 

Reconsider now the correctness results presented in Theorem 3.26 and 3.27. Let 
A G A and a G Assignments, the correctness of the assignment 

7(assign(A, a)) D ASSIGN (7(^), a), 

using the definition of the abstraction function (Definition 3.2) and Lemma 3.24, 
implies that 

ASSIGn(A, a) D Q!(^ASSIGn(7(^), c)^ 

= U{ C I C e assign(7(^), c) } 
= |J{ assign(C, a) I C g 7(A) }. 
Moreover, given c G Cond, the correctness of the filter 

using Lemma 3.24, implies that 

^{A,c)Da{<f>{j{A),c)) 

= U{C^|CG<^(7(A),C)} 

= [J{ C \ C € j{A) n modelset(c) }. 

Expressed in this form, the correctness results highlight the attribute independent 
nature of the points-to abstract domain; in this sense these results provide a limit 
to the precision attainable. Note that these limits, 

|J{ ASSiGN(C,a) I CG7(A)}, 
[J{ C I C G j{A) n modelset(c) }, 

do not depend in any way on the definition of the abstract operations but only on the 
characteristics of the abstract and concrete domains (Definition 3.1), their semantics 
(Definition 3.2) and on the concrete semantics of the operations (Definition 3.12 and 
3.13). In other words, these are limitations of the points-to representation and are 
thus common to any method based on it. In Section 2.7 we have presented an 
example of the limitations of the alias query representation; now we show some 
examples of the limitations of the points-to representation, which is strictly less 
powerful. 

Example 10. An abstract alias query is able to correctly represent when two 
pointers point to the same location, also when the pointed location is not known. 
The points-to representation is unable to do it, as illustrated in Listing 25: at 
line 7 the abstract alias query that approximates the program is able express that 
in all of the possible executions, the expressions '*p' and '*q' are aliases, that is 
the variables 'p' and 'q' point to the same location. On the other hand, the most 
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1 int a , b , *p , *q ; 

2 if (...) p = fta; 

3 else p = ftb ; 

4 // eval(*p) — {a, b} 
B q = p; 

6 // eval(*p) — EVAL(*g) — {a, b} 

7 ... 

Listing 25: in this example two executions are possible. However, in both of them 
at Hne 7 the pointers 'p' and 'q' point to the same location. 

precise points-to approximation cannot capture this fact. Let A € A where 
L = {p,q,a,b}, 

^ = {{P,a.), {P,b), {q,a), {q,b)}. 




We have j{A) = {Co, Ci, C2, C3} where 

Co = {{p,a), {q,a)}, 

Ci = {ip,b),{q,b)}, 

C,^{{p,b),iq,a)}, 

C3 = {b,a),(g,6)}. 
Consider the condition (eq, *p,*q) — c E Cond; we have 

4>(j{A),c) ^ ^{A) n modelset(c) = {Co, Ci}; 
hut the abstraction yields 

a(0(7(A),c)) =a({Co,Ci}) =CoUCi = {(p, a), (<?, a), (p, 6), (q, 6)} = A 

The a({Co,Ci}) is the most precise points-to abstraction that approximates both 
Co and Ci; however, it also approximates C2 and C3, which are not models of 
the condition c. Again, this is due to the fact that the points-to representation is 
attribute independent: in the above example we are unable to record that when a 
concrete element C is such that C \= c and {p,a) e C then also {q,a) G C. This 
situation is also is illustrated in Figures 18 and 19. 

In other words, this example shows that it is not possible to define the filter 
operation such that it always filters away all the concrete points-to descriptions 
that are not model of the suppHed condition c. In symbols: 

-^{yA e : Vc e Cond : ^{(l){A, c)) C modelset(c)] . 
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|bj Below, an extract of the concrete alias query 

ALiASmo induced by the concrete memory de- 

scription mo £ Mem. Above, a graphical repre- 

iLiASmo a b *q sentation of the points-to information Co & C 

Q I I associated to the same memory. 



*q 10 




ASmi a b *q 
*p 1 I 1 I 
*q 1 



As above, on the concrete memory description 
mi S Mem. 



Above, a graphical representation of the points- 
to abstraction A 



a({Co,Ci}) =CoUCi =Ag A 

[ ^ Below, an extract of the abstract alias query 

«({ALIASmo , ALIASmi }) 
= ALIASmo U ALIASmi 

= alias" e AliasQ". 

Note that alias" (*p, *q) = 1, that is the abstract 
alias query is able to represent that 'p' and 'q' 
always point to the same location. 

Fig. 18: a representation of the points-to and alias information associated to the code in Listing 25. 



alias' 


a 


b 


*q 


*p 


T 


T 




*q 


T 


T 







Example 11. The points-to representation keeps track only of the relations be- 
tween pointers and pointed objects that span exactly one level of indirection. For 
example, in Listing 26, the points-to representation is unable to natively express 
that '**r' is an alias of 'b ', this information — though present in the complete alias 
relation — is inferred from the points-to pairs explicitly memorized by applying the 
transitive property: it is known that 'r' points to 'p' and that 'p' points to 'b'; then 
it can be deduced that '*r' points to 'b '. But this step causes a loss of accuracy when 
there are more intermediate variables (Figure 20). The alias query representation 
is able describe that after the execution of line 9, the expression '**r' is definitely 
an alias of 'b ', whereas the points-to representation fails to do it. Let A E A such 
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0^0 
^0- 



A 


a 


b 


*q 


*p 


T 


T 


T 




*q 


T 


T 





Here both the table and the graph represent the 
points-to information A. Note that 

#(eval(A,*p) > 1; 

that is (Definition 3.7) A{*p,*q) = T, i.e., the 
points-to representation is unable to express that 
'p' and 'q' will definitely point to the same loca- 
tion. 





The concrete points-to information C2 G C is a 
spurious element of 7(A). Note that 

ALIASCJ2 ^ 7(alias''); 

that is the concrete alias relation aliascj in- 
duced by C2 would not be generated using the 
alias representation. 



CD ^ 

► [ b J The concrete points-to information C3 G C is 

another spurious element of 7{^). Note that: 

7(^) = {Co,Ci,C2,C3}; 

7(aLIAs'') = {aLIASCq I ALIASCi }■ 



C3 


a b 


*q 


*p 


1 




*q 


f 





Fig. f 9: continuation of Figure f 8. 



that 

C = {p,q,r,a,b, c}, 

^ = {ir,p), {r,q), {p,a), {q,c)}. 

We have that {Co,Ci} = 7(^) where 

Co = {{r,p), {p,a), {q,c)}, 
Ci = {{r,q), {p,a), {q,c)}. 

Let {*r,b) — X E Assignments. Performing the assignment on the elements found 
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B 
^0-0 ^0-0 

Before. After. 
Fig. 20: a representation of the points-to information before and after the execution of line 9 in 
Listing 26. 



>^0 
/ \ 


0/ — \ Below, an extract of the concrete alias query 

^ J ALiASmo induced by the concrete memory de- 

scription mo S Mem. Above, a graphical repre- 
sentation of the points-to information Co G C 
associated to the same memory. 



ALIASmo 


a 


b 


c 


*p 


1 








+q 





1 





++r 





1 







P 


q 





0-0 

^0 



*p 10 

*q 1 

**r 10 

P q 

*r 10 



As above, on the concrete memory description 
mi . 



Fig. 21: this example shows how, in the code of Listing 26, the points-to representation fails to 
describe that '**r' is alias of 'b' on all of the possible executions. 
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1 


int a. b. c. +d. +a. + + r; 


2 


p = &a ; 


3 


/ / eval(*p) — {a} 


4 


q = &c ; 


5 


// eval(*(7) = {c} 


6 


if (...) r = &p; 


7 


else r = &q; 


8 


// EVAL(*r) = {p,q} 


9 


*r = &b; 


10 


/ / eval(*p) — {a, b} 


11 


II eval(*(7) = {6,c} 



Listing 26: in this example two executions are possible. However, in both of them 
at line 10 the expression ' ++r' is an alias of 'b'. 



in the concretization of A we obtain 

ASSIGn(C7o,2;) = {(r,p),(p,6),(g,c)}, 
ASSIGn(Ci,x) = {(r,g),(p,a),(g,fe)}. 

Computing the abstraction of the result of the concrete operation we find 

a (^ASSIGN (j (A), x)^ = Q;(^{ASSIGN(Co,a;), ASSIGN(Ci,x)}j 

= ASSiGN((7o,a;) U assign((7i, x) 
^Au{{p,b),{q,b)}. 

Let 

Cg = {{r,p),{p,a),iq,b)} C a(ASSlGN(7(A), x)) ; 
note that 

HO Theorem 3.26, correctness of the assignment. 

HI Lemma 3.21, monotonicity of the concretization function. 



H2 Lemma 3.24, the abstraction effect. 

H3 Definition 3.22, the abstraction function. 

'do assign (7(A), x) C 7(ASSiGN(A,a;)) (HO) 

Dl a('ASSiGN(7(A),a;)') C a(^7(ASSiGN(A,x))j (DO, H3) 

D2 Q!(7(assign(A, a;)) j C ASSiGN(yl, a:) (H2) 

D3 a( ASSIGn(7(A), x) j C assign(A, a;) (Dl, D2) 

>b 7( afASSlGN(7(^),a)] j C 7(ASSlGN(yl, a:)) (D3, HI) 



then 

C3 e 7(assign(A, a)); 
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/ \ 


^0-0 



*q T T 
**r 10 



Above, a graphical representation of the most 
precise points-to abstraction A 

a({Co,Ci}) =CQUCi=AeA. 

Below, an extract of the abstract alias query 

a({ALIASmo ' ALIASmi }) 
= ALIASmn U ALIASm, 



*P T T = alias" e AliasC 



Note that alias" (** r, b) = 1, that is the abstract 
alias query is able to represent that the expres- 
sions **r and b are definitely aliases. 



/-0-0 
/ \ 


^0-0 



A 


a 


b 


c 




T 


T 





*q 





T 


T 


**r 


T 


T 


T 




P 


q 




*r 


T 


T 





Here both the table and the graph represent the 
points-to information A. Note that 

eval(A, * * r) 7^ eval(A, b), 

that is (Definition 3.7) A{**r,b) = T, i.e., the 
points-to representation is unable to express that 
* * r and a will be definitely aliases. 



Fig. 22: continuation of Figure 21 



but 

Cs ^ ASSiGN(Co,a); 
C3 ^ assign(Ci, a); 

that is, there exist no concrete elements C £ such that C3 = ASSIGn(C, a). 

Again this inaccuracy is due to the lack of relational information in the points-to 
representation: in this example, given a concrete element C G ASSIGn(A, a), we are 
unable to tell that if {r,p) G C then (p,b) £ C and (q,b) ^ C. The situation just 
described is illustrated in Figures 21, 22 and 23. 

In other words, this example shows that it is not possible to formulate the as- 
signment operation in such a way that each concrete element approximated by 
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0-0 






a 


b 


c 




1 








*q 








1 


**r 


1 









P 


q 




*r 


1 








The concrete points-to information C2 S C is a 
spurious element of 7(^4). Note that 

ALIASC2 ^ 7(alias''); 

that is, the concrete alias relation aliascj in- 
duced by C2 would not be generated using the 
alias representation. 



Fig. 23: continuation of Figure 22 



ASSIGn(A, a) can be expressed as the result of the concrete assignment performed 
on one of the elements of 7(A). In symbols 

^ (\/A G ^ : Va £ Assignments : 

VC e 7(assign(A, a)) : 3D e j{A) . C = ASSIGn{D, 

4.2 Precision of the Presented Method 

The two examples introduced above present a limitation of the form — all points-to 
based methods are not enough precise to capture this fact. In terms of the partial 
order of the domain this can be seen as a lower limit to the precision attainable 
with points-to based methods. On the other hand it is also interesting to find out 
what are the precision upper limits of the proposed method, i.e., statements of the 
form — the given points-to based method is enough precise to capture that fact. In 
particular, we want to analyze the situation of the presented method with respect to 
the limitations of the points-to representation, that is whether or not the inclusions 
in Theorem 3.26 and 3.27 are also equalities, i.e., if it holds that, for all A A, 
e S Expr, a G Assignments and c G Cond 



|J{ eval(C, e) I C g 7(A) } D eval(A, e); 
ASSIGN (7(A), a) D 7(assign(A, a)); 
<j,{j{A),c)Dj{cj,iA,c)). 



From the characterization presented in Section 4.1 these can be rewritten to stress 
the attribute independent nature of the points-to representation, i.e., by focusing 
on the single arcs instead of the whole points-to relation. Let A ^ A such that 
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J (A) 7^ 0/^ then we have 

yi e eval(A, e):3C e j(A) . eval(C, e) = {I}, 

y{l,m) e ASSIGn(A, a) : 3C 6 -f{A) . {l,m) 6 ASSIGn(C, a), 

y{l,m) e (?!'(^,c) : 3C e 7(A) . C G modelset(c) A {l,m) e C, 

respectively. Unfortunately, for all these cases there exists a counterexample. 

4.2.1 The Abstract Evaluation Is Not Optimal. The following example high- 
lights that the abstract evaluation function (Definition 3.6) is not optimal with 
respect to the points-to representation, i.e., there exists A A, 'y{A) ^ and 
e G Expr such that 

eval(A, e) \ |J{ eval(C, e) | C e 7(A) } ^ 0. 

Example 12. Let Ae A such that 

C {a,6,c}, 

A= {ia,a),ia,b),{b,c)}. 

We have that {Ci,C2} = jiA) where 

Ci = {(a,a),(6,c)}, 
C2 = {(a.6),(6,c)}. 

Consider the expression e = **a. Performing the evaluation of e as described in 
Definition 3.6 we obtain 



i EVAL((7i,e,i) 


EVAL(C2,e,i) 


rval(A, e,i) 


2 {a} 


{a} 


{«} 


1 {a} 




{a, 6} 


{a} 


{c} 


{a,6,c} 



Note that b G eval(A, e) but there exist no C £ 7(A) such that {b} = eval(C, e), 
indeed 

[J eval(C, e) = {a, c}. 

The spurious location b in the result of the evaluation of the expression e in A is due 
to the fact that the formulation of the abstract evaluation does not exploit that in a 
concrete points-to description a location can point to only one location; in this case 
there exists no C £ "y{A) such that {(a, a), (a, 6)} C C ^4 graphical representation 
of this example is reported in Figures 24 and 25. 

4.2.2 The Abstract Assignment Is Not Optimal. We present another example 
that highlights how the abstract assignment operation formulated in Definition 3.10 



^^Note that the additional hypothesis, 7(A) ^ 0, is required by Lemma 3.24 to prove the opposite 
of the inclusions used for the correctness results. 
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.0 

I 




The abstract memory A, 
7(A) = {Ci,C2}. 



■0 
i 





The concrete memory description Ci. 



^ The concrete memory description C2. 



Fig. 24: the abstract evaluation function is not optimal. 



eval(2) eval(1) eval(O) 



.0 



.0 .0 



0^—0- 



Fig. 25: the evaluation process of the expression * * a on the memory A of Example 12. An 
optimal evaluation function would not follow the arc (a, b) between i = 1 and i = (the dashed 
arc in the figure). 



is not optimal for the points-to representation, i.e., there exists A £ A, 7(A) ^ 0, 
a G Assignments such that 

ASSIGn(A, a) \ ASSIGn(7(A), a) ^ 0. 

Note that this hmitation is still true also assuming to have an optimal abstract 
evaluation function. 



Example 13. Let A e A such that 



C — {a, b, c}, 

A= {(a,6),(a,c)}. 
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We have that {Ci, C2} = j{A) where 

Ci = {(a, 6)}, 
C2 = {(a,c)}. 

Let (*a, *a) = x G Assignments. Performing the assignment x on the elements of 
j{A) we obtain 

ASSiGN(Ci,a;) = {{a,b),{b,b)}, 
ASSiGN(C2,a;) = {{a,c),{c,c)}. 

Computing the abstraction of the result of the concrete operation we find 

a ^ASSIGN (7(A), x)^ = a^{ASSiGN(Ci,a;), ASSiGN(C2,a;)}) 
= ASSiGN(Ci,a;) U ASSiGN(C2,a;) 
= Au{{b,b),{c,c)}. 

Note that performing the abstract evaluation of the Ihs and the rhs of the assignment 

as described in Definition 3.6 yields eval(v4, *a) = {b, c}, which is the most precise 
result possible for the abstract evaluation of the expression * a, indeed 

[J{ eval(C, o) I C e 7(A) } = eval(Ci, *a) U eval(C2, *o) 

= {6}U{c} = {6,c} 
= EVAL(A, * a). 

In this case, the abstract assignment (Definition 3.10) yields 

assign(A, x) = A\J eval(A, * a) X ¥MAh{A, * a) 
= AU {6,c} X {b,c} 
= Au{(6,6),(6,c),(c,6),(c,c)}. 

Note that 

ASSIGn(A, x) \ ASSIGN(7(A),a;) = [(h, c), (c, &)}. 

The arcs {(6, c), (c, 6)} do not correspond to any concrete assignment: they are 
artifacts of this abstraction. But note that in this case the inaccuracy cannot be 
ascribed to the abstract evaluation of the expressions that, in this case, exposes an 
optimal behaviour. The problem is that the evaluation of the rhs and the Ihs for 
the assignment are not related each other: this way it becomes possible that the 
Ihs evaluates to 'h ' and the rhs evaluates to 'c ' — thus generating the spurious arc 
(6, c) — also when the rhs and the Ihs are the same expression. This example is 
illustrated in Figure 26. 

4.2.3 The Abstract Filter Is Not Optimal. Finally, we report an example that 
shows the same inaccuracy in the filter operation, i.e., there exists A & A, 7(A) 7^ 0, 
and c € Cond such that 



<A(A,c)\</.(7(A),c) ^0. 



S. SofFia 



( ] '*~ ( ^ ] The abstraction A before the execution of the 

^ assignment x = (* a, * a). 

g 7(A) = {Ci,C2}. 









The concrete memory description Ci. 



The concrete memory description 
assign(Ci, x). 





^ The concrete memory description C2. 







0P 



<^0-0 



The concrete memory description 
assign(C2, x). 



The abstract memory 



^ o(^{assign(Ci, x), assign(C2, i:)}^ 

[ ^ ] ^ = assign(Ci, a;) U assign{C2, a;). 



^0-0 



_ The abstraction assign(A, x) resulting from the 

^ execution of the assignment. The spurious arcs 

are {(6, c), (c, 6)}. 



-0P 



Fig. 26: the abstract assignment formulation is not optimal. 
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Example 14. Let Ae A such that 
C = {a,b}, 

A^ {{a,a),{a,b),{b,b)}. 
We have that {Ci,C2} — j{A) where 

Ci = {(a, a), (6,6)}, 

C2 = {(a,6),(6,6)}. 

Consider now the condition (* * a, 6) = c G Cond. Since 

eval(Ci, * *a) = {a}, 
eval(C2, * *a) = {6}, 

only C2 satisfies c, i.e. 

0(7(A), c)= 7(74) n modelset(c) = {C2}. 

Performing the filter operation as described in Definition 3.18 on A we do not 
improve the precision, that is c) = A. 



i 


eval(A, * * a, i), 


targ(A, * * a, i), Removed arcs 


2 


{«} 


{a} 


1 


{a, 6} 


{a, 6} 





{a, 6} 


{6} 



Then note that 

0(A,c) \ a(0(7(A),c)) = A\C2 - {(a, a)}. 

r/i«s means that the filter is unable to remove the spurious arc [a, a). A graphical 
representation of this situation is presented in Figure 27, while Figure 28 present a 
graphical representation of the filter computation. 

Though the current formulation of the filter operation is not optimal, in the next 
example we show that iterating the application of the filter on the same condition 
it is possible to refine the points-to approximation. 

Example 15. Let A e A such that 

L = {a, 6, c}, 

A = {(a,a), (a,6), (a,c), (6,c), (c,a)}. 




Lnitial. 



Iteration One. 



Iteration Two. 
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V D 



The abstraction A. 
7(A) = {Ci,C2}. 



^ ^ c = {* * a,b) 



The concrete memory description Ci. This is not 
a model of 



The concrete memory description C2. This is a 
model of c; 



1; D 



The abstraction <f>{A, c) = A. The spurious arc is 
{(a, a)}. 



Fig. 27: the abstract filter formulation is not optimal. 



eval(2) eval(1) eval(O) 












H 






Q 






-1 


1- 


— -1 



targ(2) 





CD- 
I I I 

targ(1) targ(O) 
Fig. 28: in Example 14 the filter is unable to remove the spurious arc (a, a). 




Consider the condition x = (eq, * * a, c) G Cond. From the definition of the evalua- 
tion funtion (Definition 3.6), we have 



eval(A, * >(= a) = {a, b, c}, 
EVAl(A, c) = {c}. 
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From the filter definition (Definition 3.18) we have 
I = eval(A, **a)n eval(A, c) = {c}, 

(f)iA,x) = /, * * a) n /, c) = (l){A,{c},**a) n {c}, c). 

We consider only the Ihs (j){A, {c}, * * a) as, from the definition of the filter 2, it is 
clear that filtering on the rhs does not improve the precision of the approximation, 
that is, (f){A, {c}, c) = A. We have 

1 EVAL{A,**a) targ(A, {c}, K * a, (j)[A, {c}, * * a,i) 

2 {a} {a} A|(a,c)| 
1 {a, 6, c} {a, 5} A 

{a,6,c} {c} A 

That is, from the first application of the filter we can remove the spurious arc (a, c). 
Now we proceed applying the filter again. Let B — {c}, * * a, i) = ^ \ {(a, c)} . 
We have 

1 EVAL{B,**a) TARG{B,{c},**a,i) 4'[B,{c},**a,i) 

B\{{a,a)} 



2 {a} 
1 {a,b} 
{a,b,c} 



{a} 
{4 



B 
B 



Note that in the second application of the filter we are able to remove another arc, 
(a, a), that it was not removed during the first iteration. 



eval(2) eval(1) 



targ(^ 



eval(O) 

(3 




targ(O) 



eval(2) 




targ(2) 

targ(1) 



targ(O) 



First iteration. 



Second iteration. 



4.2.4 Another Consideration on the Precision of the Filter Operation. It is pos- 
sible to show that the formulation of the abstract filter operation does not generate 
spurious memory descriptions not already present in the initial approximation, i.e., 
for all A e and c G Cond 

7(0(Ac)) C7(^). 

Note that by composing this result with the result of correctness for the filter 
(Theorem 3.27) it is possible to write 

7(A) n modelset(c) c 7((^(A, c)) c 7(A). 
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Basically, the filter never adds new arcs then it is not possible to obtain a worse 
approximation of that given in input. Though the idea is quite simple, for com- 
pleteness we report a formal proof. 

Lemma 4.1. (Filter upper bound 1.) Let A G A, M C e G Expr and 

n e N; then 

(l){A,M,e,n) C A. 
Proof. Let A € A, M C jC, e e Expr and n G N. 

TS ^iA,M,e,n)CA 

HO Definition 3.16, filter 1. 

We proceed inductively on n. For the first case we assume n = 0. 

DO 4>iA, M, e, 0) = ^ (HO) 
^ (t){A, M, 6,0) C A (DO) 

Now the inductive case. 



HI 


4>{A, M,e, 


n) <ZA 


(ind. hyp.) 


DO 


(j){A, M,e, 


n+1) ^(j){A,M,e,n)\... 


(HO) 


Dl 


HA M,e, 


1) C (/)(A, M,e,n) 


(DO) 




4>{A,M,e, 


n+l) C A 


(Dl, HI) 



□ 

Lemma 4.2. (Filter upper bound 2.) Let A e A, M C C, e € Expr; then 
(l>{A, M, e) C A. 

Proof. Let A e A, M C e e Expr. Following the definition of the filter 2 
(Definition 3.17) wc considor separately two cases. For the first case let e = Z e £; 
if EVAl(A, /) e M then wc have (f){A, M, I) = A, otherwise (t>{A, M, I) = _L. In both 
the cases we have the thesis. For the second case let e e Expr \ C 



TS 


^A, M,e)CA 




HO 


Definition 3.17, filter 2. 




HI 


Lemma 4.1, filter upper bound 1. 




DO 


(^(A,M,e) =n„eN'/'(AM,e,n) 


(HO) 


Dl 


Vn e N : (j){A,M,e,n) C A 


(HI) 


D2 


f]„^j,cj){A,M,e,n)CA 


(Dl) 




^A, M,e)CA 


(D2, DO) 



□ 

Lemma 4.3. (Filter upper bound 3.) Let AG A and c G Cond; then 

7(0(Ac)) C7(A). 
Proof. Let Ae A and let c G Cond. 
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TS (j){A, c)(ZA 

HO Definition 3.18, filter 3. 

HI Definition 3.2, concrctization function. 

H2 Lemma 4.2, filter upper bound 2. 

As in the definition of the filter (Definition 3.18) we distinguish two cases 

c= (eq,e,/); 
c = (neq,e,/). 

For the first case (CI) we have 



DO 


(j)[A, (cq, e, /)) = (t){A, eval(A, e) n EVAl(A, /), e) 






C\(j)[A, eval(^, e) n eval(A, /),/) 


(HO) 


Dl 


(l){A, eval(^, e) n eval(^, /), e) C A 


(H2) 


D2 


4>{A, eval(A, e) n eval(A, /), /) C ^ 


(H2) 


D3 


(j)[A, eval(A, e) n eval(A, /), e) 






n (j){A, eval(^, e) n eval(A, /), /) C A 


(Dl, D2) 


>b 


<^(A(eq,e,/)) C A 


(D3, DO) 



Now the second case (C2). If #(EVAL(Ae) n EVAL(A,/)) 7^ 1 from HO we have 
that <t){A, (ncq, e,/)) = A then the thesis is trivially verified. Otherwise assume 
#(EVAL(yl, e) n eval(A, /)) = 1. Then we have 



H3 


#(eval(A, e) n eval(A, /)) = 1 




DO 


4>{A, (neq, e, /)) = 4>{A, eval(A, e) \ eval(A, /), e) 






U (t>{A, eval(^, /) \ eval(^, e), /) 


(H3, ho) 


Dl 


(j){A, eval(A, e) \ EYAh{A, /), e) Q A 


(H2) 


D2 


4i\a, eval(A, /) \ eval(A, e), /) c A 


(H2) 


D3 


(1){a, eval(^, e) \ eval(^, /), e) 






U (l){A, eval(A, /) \ eval(^, e), /) C A 


(Dl, D2) 




<^(^, (neq, e, /)) C A 


(D3, DO) 



From the definition of the concretization function HI we have that 

<^(Ac) CA =^ 7(<^(Ac)) C7(A), 

Since wo have just proved the antecedent of this implication, we have the truth of 
the consequent, which is the thesis. □ 

4.3 A Final Consideration 

As stated in the first few lines of this section, the presented model is intentionally 
simplified to ease the presentation and the proofs. However, these concepts can be 
generalized to treat more complex environments and languages. In Listing 27 we 
present an example^^ that shows a more reahstic implementation of the situation 
presented in Example 13. This example shows how using recursive data structures 
it is possible to generate the points-to relations presented in the previous examples: 



This example comes from the test suite of our implementation of the algorithms. 



(CI) 
(C2) 
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1 


Struct L { 






2 




struct L 


* next ; 




3 




int value 






4 


}; 








5 










6 


int main ( ) 


{ 




7 




struct L 


a , b , c ; 




8 
9 




if (...) 


a . next 


= &a ; 


10 




else 






11 




if (. . . 


) a . next 


= &b; 


12 




else 


a . next 


= &c ; 


13 










14 




b . next = 


&c ; 




15 




c . next = 


fta ; 




16 










17 




if ( a . next - >next = = 


= &c) { 


18 










19 




} 






20 


} 









Listing 27: An example of code that shows the incompleteness of the filter algorithm 
using a recursive data structure. 



in particular loops and locations pointing to themselves, which are quite uncommon 
to see using only basic types. 

5. THE EXTENDED ABSTRACT MEMORY MODEL 

With the aim of presenting a reahstic points-to analysis, this section discusses some 
extensions to the simplified model previously introduced. More precisely, this sec- 
tion describes a more realistic memory model by augmenting the previously de- 
scribed domains with some details not directly related to the points-to problem, 
which are however necessary for the definition of a working memory. 

5.1 Abstract and Concrete Locations 

One of the main limitations of the formal model presented in Section 3 is due to the 
assumption that both the concrete and the abstract domains share the same set of 
locations C. Any abstract domain that aims to be practically applicable cannot rely 
on this assumption. From the definitions in Section 3 we have that for every variable 
created in a concrete execution there must be a distinct location in the abstract 
memory description. This is obviously a problem since, with the use of recursion 
and dynamic allocation, the number of variables created during a concrete execution 
can be unbounded. But also when the number of variables is known statically it is 
usually unfeasible to use a one-to-one approximation; consider for instance the case 
of arrays: under this assumption an abstract memory would be required to represent 
every element of an array with a distinct location. Typically, real implementations 
use one abstract location to approximate a set of concrete locations. For instance, 
a simple strategy is to approximate all the elements of an array, independently 
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from their number, with the same abstract location. Previously we have used the 
symbol L to denote the set of locations. From now on we denote with C the set 
of the concrete locations and with a set that wo call the abstract location set. 
We still formaHze the concrete domain as the complete lattice generated by the 
powerset of the total functions £ — > £. However, we have to adapt the definition 
of the abstract domain as follows. 

Definition 5.1. (Extended abstract domain.) Let A the support set of the 

abstract domain be defined as 

A = KxL^xCK 

In words, an element A E A is a pair (/, P) where / G A represents the abstraction 
function from the concrete to the abstract locations and P Q C"^ x is an abstract 
points-to relation. We call abstract domain the complete lattice 

(AE,u,n,±,T), 

where, for all {f,P), {g,Q) € A, holds that 

{f,P)Q{9,Q) ^ f = gAPQQ; 

l_L, otherwise; 

I I , otherwise. 

and the bottom (-L) and top (T) elements are defined ad-hoc to satisfy the properties 
of the complete lattice. 

Informally, given an abstract element (/, P) = A <E A, for every concrete element 
C G C and every concrete location I G £, f{C, I) is the abstract location that in C 
abstracts I. The semantics of the abstract domain can thus be defined as follows. 

Definition 5.2. (Extended abstract domain semantics.) Let C & C and 

(/, P)=A€A. We define 

Ce^{A) 44 {{f{C,l),fiC,m))\{l,m)GC}cR 

The initial definition of the concretization function (Definition 3.2) simply checks 
if all the pairs of C are also in A; now, to handle the concept of abstract loca- 
tions, every concrete points-to pair {l,m) G C is abstracted, obtaining the pair 
(^f{C,l),f{C,m)), and then we check in this "abstract pair" is in A. But the dis- 
tinction between concrete and abstract locations introduces a new problem in the 
formalization of the abstract analysis. 

5.2 Weak Updates and Strong Updates 

This section gives an insight of the distinction between weak and strong updates. In 
the literature, the term update usually means an operation that acts on a memory. 
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1 int a , b , c , 


d , *p ; 




2 p = &a ; 


// eval(*p) 




3 p = &b; 


// eval(*p) 


= {b} 


4 p = &c ; 


// eval(*p) 


= {c} 


6 p = &d; 


/ / eval(*p) 





Listing 28: the annotations resulting from the use of strong updates. 



1 int a , b , c 

2 p = &a ; 

3 p = &b; 

4 p = &c ; 

5 p = fed ; 



d , *p ; 

/ / eval(*p) 
// eval(*p) 
// eval(*p) 
// eval(*p) 



{a} 
Kb} 
{a,b,c} 
{a,b, c,d} 



Listing 29: the annotations resulting from the use of weak updates. 



1 int **pp , *pl , *p2 , a, b, c; 

2 if ( ... ) pp = ftpl ; 

3 else pp = ftp2 ; 

4 pi = fta ; 

5 p2 = &c ; 

6 *pp = &b ; 



Listing 30: an example where it is necessary to apply weak updates to obtain a safe 
approximation. 



concrete or abstract, modifying its state. An update can be triggered by any the of 
usual operations, e.g., as the assignment (Definition 3.10). However, the distinction 
between strong and weak updates pertains only to the formalization of the abstract 
domain. A strong update has the effect of overwriting the previous information 
with new data; instead, a weak update acts by merging the original with the new 
data. Listings 28 and 29 present the different results of the analysis performed on 
the same program: in the first case using strong updates, whereas in the second 
case weak updates are appHed. By using weak updates it is not possible to increase 
the precision of the approximation — each weak update yields a new abstraction 
that subsumes the original information. Note that in Listing 29, to illustrate the 
difference between the two options, we have forced the analysis to use weak updates. 
However, there are situations where the use of weak updates is necessary to obtain 
a safe approximation. Consider the example in Listing 30. The abstract execution 
reaches the last fine with the approximation 



EVAL(*pl) 
EVAL(*p2) 

eval(*pp) 



= {a}, 
= {c}, 
= {pl,p2}. 
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By applying the assignment as presented in Definition 3.10 we obtain the description 

EVAL(+pl) = {a, b}, 
EVAL(+p2) = {b, c}, 
EVAL(*pp) = {pl,p2}. 

In this case the abstract assignment algorithm has performed a weak update: the 
old values of the variables 'pi' and 'p2' are not overwritten. By forcing a strong 
update we would obtain instead 

EVAL(*pl) = {b}, 
EVAL(+p2) = {b}, 

eval(*pp) = {pl,p2}, 

which is clearly a wrong approximation because there exists at least a concrete 
execution such that, after the execution of the assignment '*p = &b', EVAL(*pp) = 
{p2} holds and then EVAL(*pl) = {a}. Note that in the definition of the abstract 
assignment (Definition 3.10), given (e, /) £ Assignments, what triggers the use of 
a strong instead of a weak update is the fact that the Ihs e evaluates to a single 
location: 

^d^ff---, if #EVAL(^,e) = 1; 
10, otherwise. 

where K denotes the set of the killed points-to pairs. The basic idea behind this 
approach is that when we have to update a set of more than one location it is pos- 
sible that there exists a concrete memory description approximated by the current 
abstraction in which only one of the locations of this set will be modified while the 
others will retain their original value. In the above example when 'pp' points to 
'pi' then 'p2' is left unchanged by the assignment '*pp = &b'. Otherwise, when we 
are sure that the there is only one possible modified location we can afford that in 
none of the concrete memories C € 7(0? (A, • • • )) that location will still have the 
old value. However, by distinguishing between concrete and abstract locations, we 
are no more able to discern when a strong update can be used. Now, also when 
the Ihs evaluates to a single location, EVAl(A, e) = {^'j, we cannot safely apply a 
strong update as it is possible that abstracts more that one concrete locations. 
To overcome this problem we introduce the following definition. 

Definition 5.3. (Singular locations.) Let 

Singular CAxC^ 

be defined as follows. Let (/, P) = A E A and S We say that the location 
is singular in the memory abstraction A when 

{A, Z") G Singular 44 VC G 7(A) : #{ Z G £ | /(C, /) = /"}<!. 

The above definition can be read as follows. We say that an abstract location Z" 
is singular with respect to the abstract memory description A G if it does not 
exist any concrete memory description C G 7(A) such that l^ approximates more 
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than one of the locations of C . For convenience of notation we write Singular(A) 
to denote the set of the singular locations of the memory A, i.e, 

Singular(A) =^ { \ {A, /«) e Singular } 

The abstract assignment operation (Definition 3.10) must be adapted in order to 
provide a safe approximation. In particular, the definition of the kill set needs to 
be rewritten as 

E =^ eval(A, e); 

{Ex C, if # S = 1 A E C Singular(A); 
l0, otherwise. 

Also the definition of the filter operation (Definition 3.16) must be updated accord- 
ingly. Given x & Ay. p{C) x Expr and i € N; we have 

K'^= C\ targ(x, z); 
T =^ TARG(a;,i + 1); 

I 0, otherwise. 

Also the definition of the filter for the 'neq' operator (Definition 3.18) needs to be 
updated accordingly. Given A & A and e, / G Expr; let 

def 

/ = eval(A, e) n eval(A, /); 
E =' eval(A, e) \ eval(^, /); 
F =^ eval(^, /) \ eval(A, e); 
then 

def /</>(A, E, e) U (i){A, F,f), if # / = 1 A / C Singular(A); 



(l){A, (neq, e, /)) 



A, otherwise. 



Finally, also the definition of the alias relation induced by a points-to abstraction 
must be adapted in the same way. From Definition 3.7, for all A G A, we define 



j{A) =^ alias' as follows. For every e, / e expressions 
E ^= eval(A, e); 



f1|*eval(A/); 



ALIAS*' (e, /) =^ < 



0, \iEr\F = %- 

1, \i ij.E=lAE = F AE C Singular(A); 
T, otherwise. 



With these modified definitions, assuming for instance to approximate all the el- 
ements of an array with only one (non-singular) abstract location, the analysis 
applied to the code in Listing 31 produces the indicated annotations. 
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1 


int *p 


[10] , 


. *q; 








2 
3 


int main ( ) 


r 

i 








4 


int 


X , y . 


, z ; 








5 








// 


eval(*p) 


= {null} 


6 


p[. . 


.] = 


&x ; 


/ / 


eval(*p) 


= {null, x} 


7 


p[. . 


.] = 


&y ; 


// 


eval(*p) 


= {null,x,y} 


8 


p[. . 


.] = 


ftx ; 


// 


eval(*p) 


= {null,x,y} 


9 


p[. . 


.] = 


ftz ; 


// 


eval(*p) 


= {null,x,y,z} 


10 
11 








// 


EVAL(*q) 


= {null} 


12 


q = 


&x ; 




// 


EVAL(*q) 


= {x} 


13 


q = 


&y ; 




// 


EVAL(*q) 


= {y} 


14 


q = 


&x ; 




// 


EVAL(*q) 


= {x} 


15 


q = 


&z ; 




// 


EVAL(*q) 




16 


} 













Listing 31: this code shows the difference between singular and non-singular 
locations. Remember that in the C language global variables are zero-initiaHzed. 
Assume that all the indices left unspecified are valid. 



5.3 Notation 

In the following description we use more than once the concept of sequence. With se- 
quence we mean a set S whose elements are enumerated, thus they can be identified 
and compared against their position inside the sequence. With position we mean an 
index ranging from up to n where n+1 is the number of elements^^ of S. For conve- 
nience of notation we write 'S'.size' to denote the number of elements of the sequence 
S; we write Si or S{i) to denote the element of S with index i and dom(S') as an ab- 
breviation of the set of the indices of S, i.e., dom(S') = {neN|0<n< S'.size }. 
To explicitly represents the elements of the sequence we write S — [Sq, - ■ ■ , 
When we are not interested in the definition of any particular order among the 
elements of 5', we use the concept of labelled set. A labelled set can be defined as 
the triple {F, L, S), where S is the set of the labelled elements, L is a set of labels 
and F: L ^ 5 is a partial surjective labelling function that gives a unique name, 
or label, to all the elements of S. For convenience of notation, when F and L are 
clear from the context, we write only S to refer to the labelled set {F,L,S); we 
write Si or S{1) as an abbreviation of F(l) and dom(S') as a shortcut for dom(F). 
To expHcitly represent the elements of S we write S = {So, • ■ ■ , Sn}-^^ Note that 
this definition of labelled set is a generaHzation of the concept of sequence where 
L = N — hence the following definitions given for labelled sets can be applied also 
to sequences. We use also the concept of attribute. Given two labelled sets S and 
A, we say that the pair {S, A) is a labelled set with attributes set A. Again, when 
the attribute set A is clear from the context, we write S to mean the pair {S,A); 
we say that S has the attribute X to mean that X G dom(A) and we write ^S.X' 



The concept of position is not defined for the empty sequence. 

That is, at the only extent of denoting the elements of S, we enumerate it. 



94 • S. Soffia 



as a shortcut for 'A{X)\ 

5.4 The Concept of Memory Shape 

The abstract memory model that we want to describe is parametric with respect 
to the underlying abstract domain, e.g., the points-to domain or some numerical 
domain. In other words, the analysis can be soon as the coupling of a chosen abstract 
domain and some additional 'structural' information, concerning for instance the 
memory model of the target language/machine. With the concept of shape we want 
to formalize this 'structural' information. Recalling the definition of the extended 
abstract domain Definition 5.1, this information is needed to identify the function 
/ G A, that is, how concrete locations are mapped to abstract locations. 

Definition 5.4. (Shape of a labelled set.) Let {F,L,S) be a given labelled 
set. We define the shape of {F, L, S) as the other labelled set 

SHAPE((i^, L, S)) =^ (G, L, T), 
where 

T { SHAPE(e) I e e S*}, 
and G: L y-* T is such that dom(G) = dom(F) and is defined, for all I € C, as 

Gil) 1^'' SHAPE (F(0). 
Now let {S, A) be a labelled set with the attribute set A. We define its shape as 

shape((S',^)) =^ (shape(5), A). 

Note that, as a consequence of this definition, the shape of a sequence S = 
[^o, • ■ • , Sn] is the sequence of the shapes 

shape([S'o, ■ • • . Sn]) [shape(5o): • • • . shape(5'„)] . 
Note also that the SHAPE function does not change the attributes of a labelled set. 

5.5 Common Concepts 

The following sections will describe the structure of both the concrete and abstract 
memory models. Before proceeding we need to introduce some common concepts. 
We refer to [BHPZ07] for a rigorous formalization of some of the ideas that we 
present only informally. 

Location. . The basic unit for describing the structure of the memory is the con- 
cept of location. Each location has a 'type' attribute. 

Allocation.. We use the concept of allocation block to describe the unit of al- 
location of the memory. An allocation block is a sequence of locations, it has a 
'type' attribute and it is the base case of the inductive definition of the concept of 
shape (Definition 5.4). We define the shape of an allocation block A as its 'type' 
attribute^^ 

shape(v4) A. type. 



That is, the shape of an allocation is its 'type' attribute and not the sequence of the shapes of 
its locations. 
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The type attribute of an allocation block uniquely determines the shape of the se- 
quence of its locations; the details of this aspect will be clarified later. Informally, 
wc say that oach variable definition in the analyzed program has the effect of cre- 
ating an allocation block in the memory, or, if speaking of an abstract memory, 
updating an already existing allocation block. In the next, when clear from the 
context, we call an allocation block simply allocation. 

To describe the structure of the concrete stack and its abstraction we introduce 
these definitions. 

Block.. Wc use the term deallocation block to mean a sequence of allocations. The 
deallocation block is the unit for the deallocation of stack allocated memory. Ideally, 
the deallocation block is intended to represent the concept of block of declarations 
as it is defined by the C language. The order of the allocations inside a deallocation 
block reflects the order of creation of the variables. For conformance with the C 
Standard, when clear from tlie context we will refer to a deallocation block simply 
as a block. A block can also be described as the portion of the stack between two 
subsequent block marks [BHPZ07]. 

Frame.. With frame we mean a sequence of blocks. In the concrete memory 
model, a frame can also be characterized as the portion of the stack segment between 
two subsequent link marks [BHPZ07]. Each link mark uniquely identifies the call 
statement that has generated the link mark. To identify the call statements of the 
program under analysis we use the concept of call site — each call statement in the 
program is uniquely identified by a call site.^° Each frame has a 'callsite' attribute. 
The value of this attribute is equal to the call site of the link mark that closes the 
frame — with this definition, from the program source code, the call site of a frame 
uniquely determines the shape of the whole frame. 

Example 16. Consider the code in Listing 32. The frame identified by the call 
site 1, that corresponds to the call statement at line 8, can be described as 

\[int p\,[int a, int b],[int c]] . 

Instead, the frame identified by the call site 2, that corresponds to the call statement 
at line 12, can be described as 

\[int p\. [int a, int h\,[int d, int e]] . 

In the next we apply to the concepts just introduced the qualifiers concrete and 
abstract. If X is a labelled set of y objects, then with concrete X we mean a labelled 
set of concrete Y objects; with abstract X wc mean a labelled set of abstract Y 
objects. For example we call 'abstract frame' a sequence of abstract blocks; with 
'concrete allocation' we mean a sequence of concrete locations. When the qualifier 
abstract/concrete is not specified, the context will clarify the intended one or if the 
statement is applicable to both cases. 



~ A reasonable choice to implement the call site concept is to use the program point associated 
to the call statement. However, for clarity we want to keep separate the concept of call site and 
program point. 
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1 


void g () ; 




2 






3 


void f ( int p) 


{ 


4 


int a , b ; 




5 


if (...) { 




6 


int c ; 




7 






8 


gO ; 


/ / Call site 1 


9 


} else { 




10 


int d , e ; 




11 






12 


gO ; 


/ / Call site 2 


13 


} 




14 


} 





Listing 32: the call site completely identifies the shape of the frame. 



5.6 The Concrete Memory Model 

The concrete memory is organized as a labelled set of segments. 

Text. . The text segment is a labelled set of allocations used to represent the set 
of the possible targets of function pointers: basically there is one allocation for 
each function declared in the analyzed program. Each allocation is identified by 
the program point associated to the function declaration.^^ 

Heap.. The heap segment is a labelled set of allocations used to represent the 
objects created using the functions of the 'malloc' family. In this segment each 
allocation is labelled by an address^^ and has the attribute 'ppoint' (program point) 
that uniquely identifies the statement that has caused the allocation. Note that 
once fixed the analyzed program, the program point of the allocating statement 
identifies the type attribute of the allocation, that is, the shape of the allocation. 
As a consequence, given two heap allocations with the same program point attribute 
we know that these allocations have also the same shape. 

Global. The global segment is a sequence of allocations that represent the global 
variables of the analyzed program. Note that the order of the allocations inside the 
global segment is not specified by the C Standard; thus, this detail is left to the 
particular execution model implemented; for instance, this order may be infiuenced 
by the particular combination of architecture / compiler chosen as target the for the 
analysis. 

Stack frames.. The stack frames segment is a sequence of frames. The sequence 
is organized such that the frame of index represents the topmost frame^^ on the 



In case the same function is defined multiple times, then obvious disambiguation methods are 
necessary; for example, as considering only the first occurrence of the declaration. 
^^At this level we are not interested in the details of the addressing schema of the concrete 
execution model. We simply require that each heap allocation can be identified inside the segment 
by a tag or address. 

^''With topmost frame we mean the most recent frame on the stack, that is the frame below the 
topmost link mark. 
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stack and the frame of index n — where n-|- 1 is the size of this segment — represents 
the oldest frame. 

Stack top.. The stack top segment represents the locations above the topmost 
link mark. It is a sequence of blocks (not contained in any frame), followed by a 
sequence of allocations (not contained in any block.). 

Then we define a concrete memory a e Mom as a labelled set of the form 

a = {text, heap, global, stackframes, stacktop}. 

For convenience of notation we use the notation 'a.X' to refer to the X segment 
of the memory o. For example, we write 'a. text' to denote the text segment of a. 
Before describing how the type attribute of a concrete allocation determines the 
shape of the sequence of its locations, we need to introduce some notation. 

Definition 5.5. (Concatenation of sequences.) Let A = [Aq,--- ,An] and 
B = [Bo, ■ • • , -Bm] fee two sequences; then we define A :: B as the concatenation of 
the two sequences 

A :: B = [Aq, • • • , An, Bq, • • • , Bm]- 

Definition 5.6. (Concrete allocations.) We define the 'alloc' function by 
structural induction on the set of types Types'. Let t € Types. If t is a scalar type 
or a function type then^^ 

ALLOC(f) =^ [t]. 

If t is an array of n €N elements of type to then 

ALLOC(t) =^ ALLOC(to) ALLOC(io) • 

^ V ' 

n+1 times 

If t is a structure type with fields: to fieldo; ■ ■ ■ ;tn field^^; we define 

ALLOC(t) =^ ALLOC(to) ALLOC(i„). 

Example 17. Consider Listing 33; then we have 
ALLOC{int [4]) = [ int, int, int, int, intj ; 

5 times 

ALLOC{struct A) — [int, int, int, int, int, float]; 

ALLOC{struct B) = [double, int, int, int, int, int, int, float, char]. 



For instance, in tlie analysis of a complete program, the oldest frame, if present, is generated 
by one of the call statements contained in the 'mainO' function. 

^^For the definition of the concept of l.ype we refer to the C Standard [Int99, 6.2.5.21]: arith- 
metic types and pointer types are collectively called scalar types. Array and structure types are 
collectively called aggregate types. 
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1 


Struct A { 


2 


int a [4] ; 


3 


float b ; 


4 


}; 


5 




6 


struct B { 


7 


double x; 


8 


struct A a; 


9 


char y ; 


10 


}; 



Listing 33: the definition of an aggregate type. 



5.7 The Abstract Memory Model 

Now, having introduced these basic ingredients, we can describe the organization of 
the abstract memory that, as the concrete memory model, is composed by different 
segments. 

Text.. The text segment is a labelled set of abstract allocations that used as 
targets for function pointers. The definition of the abstract text segment is the 
same of the concrete case: there is one text location for each function declared in 
the analyzed program and each location is labelled by the program point associated 
to the function declaration. 

Heap.. The heap segment is a labelled set of allocations used to abstract all the 
possible heap-allocated objects. Each heap allocation has as attribute the program 
point of the statement that has caused the allocation which is also used as label 
to identify the allocation inside the segment. This means that the abstract heap 
segment contains only one allocation for each allocating statement of the analyzed 
program. 

Global. The global segment is a sequence of allocations that represents the global 
variables of the analyzed program. The order of the allocations inside the abstract 
global segment is chosen to reflect the layout of the concrete global segment. 

To represent the abstraction of the concrete stack we use three distinct segments. 

Stack top.. The stack top segment represents the portion of the stack above the 
topmost Hnk mark. As in the concrete case, the stack top is formalized as a sequence 
of blocks (not contained in any frame) , followed by a sequence of allocations (not 
contained in any block.) 

Stack head.. The stack head segment is a sequence of frames. 

Stack tail. . The stack tail segment is a labelled set of frames where each frame is 
labelled by its 'call site' attribute. This means that the stack tail contains at most 
one frame for each of the possible call sites of the analyzed program. 

Finally, we deflne an abstract memory a G Mem" as a labelled set 

a = {text, heap, global, stacktail, stackhead, stacktop}. 

As for the concrete memory, for convenience of notation we write 'a.X' to refer to 
the X segment of the abstract memory a; for example we write 'a. text' to denote 
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the text segment of the abstract memory a. Now we present how the type of an 
abstract allocation determines the shape of the sequence of its locations. 

Definition 5.7. (Abstract allocations.) Let t G Types. If t is a scalar type 
or a function type, then 

ALLOC«(t) =^ [t], 

If t is an array of n gN elements of type to then 

ALLOc''(t) ALLOC*'(io) " ALLOc''(to) " ALLOC*'(to)- 

If t is a structure type with fields: to fieldQ] ■ • ■ ;tn fidd^; we define 

ALLOc''(t) ALLOC*'(io) ALLOC*'(t„). 
Example 18. Consider again Listing 33; this time we have 
ALLOC^ {int [4]) = [int, int, int]; 

^ V ' 

3 times 

ALLOC^{struct a) = [int, int, int, float]; 
ALLOC^{struct B) = [double, int, int, int, float, c/iar]. 

Note that we approximate arrays using three parts. In Section 5.12 we show how 
these parts can be used by the analysis. 

5.8 The Lattice Structure 

As in Section 3, we formalize the concrete domain as the complete lattice generated 
by the powcrsct of the concrete memories Mom. Our next step is to introduce 
the missing elements required to complete the structure of complete lattice for the 
abstract domain. The bottom (±) and the top (T) elements are defined ad-hoc. 
Now we introduce the two binary operations of meet (n) and join (U) and the 
partial order (C). In our analysis the operations of join and meet, as well as the 
query on the partial order, always occur between abstractions having a similar 
structure; these are the cases that wc consider "interesting" and on which wc define 
the operations. However, since the formalization requires total operations, we will 
extend the definition to "non-interesting" cases in a trivial way, that is when asked 
to compute the join or the meet, we will simply answer T and ±, respectively. Note 
that this is a specialization of the behaviour described in Definition 5.1. In this 
sense, when we say that two elements of Mem", say {f,P), (g,Q), share a similar 
structure wc mean that / = g. To formalize the concept of similar structure we 
introduce the relation 'Compatible'. 

Definition 5.8. (Compatibility between abstract memories.) Let 

Compatible C Mem' x Mem" 
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be defined as follows. Let A, B G Mem"; then we say that {A, B) G Compatible 
when the following conditions hold: 

SHAPE(A.text) = SHAPE(B.text); 
SHAPE(A.heap) = SHAPE(B.heap); 

SHAPE(A.global) = snAPE(i3. global); 
SHAPE(A.stacktop) = SHAPE(i3.stacktop); 
SHAPE(^.stacklicad) = SHAPE(i?.stackhoad). 

Note that in the definition of the 'Compatible' relation, no constraints are speci- 
fied on the shape of the stack tail segment. 

Definition 5.9. (Abstract domain partial order.) Let A and B be two 

labelled setsP'^ Let 

A<B 44 dom{A) C dom(B) A G dom(yl) : A{1) < B{1). 

Let A,B € Mem". We say that 

A'^B 44 (A, B) G Compatible ^A<B. 

Note that this definition proceeds inductively on the structure of the abstract 
memory. The base case of this induction are locations. On locations, the definition 
of the partial order 'C', of the operations 'U' and depends on the particular 
abstract domain adopted. 

Example 19. With location address we mean an information that allow to iden- 
tify a location inside a memory. If the abstract memory is based on a points-to 
domain, locations are formalized as sets of location addresses — a set of location 
addresses is used to represent the set of the possibly pointed locations. In this case, 
the partial order on locations is simply the relation of containment 'C ' between sets 
of location addresses. 

Definition 5.10. (Abstract domain join operation.) Let A and B be la- 
belled sets. We define Ay B such that dom(A) U dom(B) = dom(C) and, for all 
I G £, 

{A{1), if I edoM A) \doTn{B); 

{AVB){1) =^ I B{1). if I e dom(B) \dom(A); 

[A{r)VB{l), otherwise. 

Let A,B G Mem*. We define 

^ 1^ ^ def V if{A,B)e Compatible; 
It, otherwise. 

Definition 5.11. (Abstract memory meet operation.) Let A and B be 

labelled sets. We define AaB such that dom(A) fl dom(B) = dom(C) and, for all 



^^As said above this definition is valid also if A and B are sequences, as the sequence is a particular 
case of labelled set. 
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I G C, 



{AAB){l)=A{l)AB{l). 



Let A,B e Mem*'. We define 

def M A B, if {A, B) G Compatible; 



AnB 



_L, otherwise. 



In the computation of the meet operation it is possible to reach the bottom on 
some of the locations. Depending on the position of the locations inside the 
abstract memory, this bottom can be propagated. If the bottom is reached on a 
location contained in the stack tail, then the bottom can be propagated to the 
frame that contains the location: this is equivalent to removing the frame from the 
stack tail. If the bottom location is in any other segment then the bottom can be 
extended to the whole memory. The reason of this will be clarified by the definition 
of the semantics of the abstract memory. 



5.9 Concretization Function of the Abstract Memory 

This section presents the concretization function for the abstract memory model 
Mem" . The definition proceed by structural induction on the definition of abstract 
memory. The first step is to find a mapping between the shape of the concrete 
memory and the shape of the abstract memory. Note that at this point we are not 
interested in dealing with the value of the memory — which is defined by the value 
of the locations — but only in describing a relation about the shape. In other words, 
given an abstract element (/, P) G Mem" and a m G Mem, we are now trying to 
identify the function / G A is defined on m (Definition 5.1). As already done for 
the definition of the operations of meet and join, wc first formulate a compatibility 
relation to express the requirements on the structure of the concrete and abstract 
memories. 

Definition 5.12. (Compatibility between concrete and abstract mem- 
ories.) Let 

Compatible C Mem x Mem". 

Let A G Mem" and C G Mem; then we say that (C, A) G Compatible when hold the 



Locations represents elements of the underlying abstract domain. Computing the meet between 
two locations, it is possible to reach the bottom of the abstract domain. 
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following conditions 

SHAPE(C.text) = SHAPE(A.text); (1) 

VZ G dom(C.heap) : C.heap(Z).ppoiiit € dom(y4.heap); (2) 

SHAPE(C.global) = SHAPE(A.global); (3) 

SHAPE(A.stacktop) = SHAPE(C.stacktop); (4) 

A.stackhead.size < C.stackframes.size; (5) 
Vi £ {0, • • • , ^.stackhead.size — 1} : 

SHAPE (A.stackhead(i)) = shape (C.stackfranies(i)); (6) 
Vi e {A.stackhead.size, • • • , C.stackframes.size — 1} : 

C.stack£ranies(i).ppoint G dom(^.stacktail). (7) 



In words, a concrete memory C G Mem and an abstract memory C e Mem" are 
compatible when holds the following conditions. 

(1) The shapes of the text segments must be the equal. From the definition, both 
the concrete and the abstract segment contain an abstract allocation for every 
declared function. Hence, as long as A and C refer to the same program, this 
is always true. 

(2) Recall that, within the concrete heap segment, allocations are identified by 
addresses; whereas, in the abstract heap segment, allocations are identified 
by program points. For the heap segment we require that to each concrete 
heap allocation there corresponds an abstract heap allocation identified by the 
program point of the concrete allocation. 

(3) The shapes of the global segments must be equal, from the definition of shape, 
this implies that the global segments must contain the same number of alloca- 
tions and that each concrete allocation corresponds to an abstract allocation 
with the same type. Again, as long as A and C refer to the same program this 
property is always true. 

(4) The stack top segments must have the same shape; that is, the parts of the 
stack above the topmost link mark must have the same shape. 

(5) The stack frames segment of C does not contain less frames than the stack head 
segment of A. 

(6) The shape of stack head segment of A must be a prefix of the shape of the 
stackframes segment of C. 

(7) The remaining part of the stack frames segment of C must be compatible with 
the stack tail segment of A. Recall that in the stack tail the frames are iden- 
tified by their call site; thus, this means that to every frame of C.stackframes 
corresponds in j4.stacktail a frame with the same call site. 

Given a concrete memory m € Mem and an abstract memory m.^ = (/, P) G 
Mem", once we know that the m is compatible with m*, we ask how the locations 
of m map onto the locations of m", that is, how the function /(m, is 
defined, as this is required in order to complete the definition of the semantics of 
the abstract domain (Definition 5.2). Before going into the details we introduce the 
idea behind the approach. By looking at the definitions of the concrete and abstract 
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memories, note that these objects can be seen as trees — every labelled set is a 
node with its elements as children. If memories are trees, then we can characterize 
locations as the leaves. In other words, a location can be uniquely identified within 
a memory by the path that connects the root of the tree to the corresponding 
leaf node. Under these assumptions we can identify concrete location addresses as 
the paths inside the concrete memories and the abstract location addresses as the 
paths inside the abstract memories. We can now restate our initial problem as the 
problem of determining a mapping from paths on a concrete tree to paths on an 
abstract tree. To do this we exploit the recursive structure of trees — for each 
subtree of m" we have to identify the set of subtrees S'o, • • ' > of m that are 
mapped into 5"; being the leaves the limit case of subtrees, we will end up having 
a map from the leaves of m to the leaves of m" . To formalize this mapping we use 
triples of the form 

(5»,{5o,-- - ,^„},Af), 

where 5" is a subtree of to", Sq, - • ■ ,Sn are the subtrees of to mapped into iS" and 
M is a map that defines how the children of So, - ■ ■ ,Sn are mapped to the children 
of 5«. 

Definition 5.13. (Concretization of allocations.) We define the map func- 
tion by structural induction on the set Types. Let t S Types; then 

'{(O,{O},0)}, 

if t is scalar or function type; 

(0,{0},MAP(to)), 

- ,n- l},MAP(to)), 
(2,{n},MAP(io))}, 

if t is an array of size n of type to; 
I (i,{i},MAp(ti)) ie {O,--- ,n}|, 

if t = struct : to field^, ■ ■ ■ ,tn field^; 

Let ao, ■ • • ,an,cfi be allocations such that 

Vi e {0, • • • ,n} : a" .type = a^.type. 
Then we define 

MAp(a*, {ao, ■ • ■ ,a„}) '^^ (a*, {ao, • • • , a„}, MAP(ao.typc)). 

Recall that allocations are the base case of the definition of shape: the shape 
of an allocation is its type attribute. This means that the above condition on the 
tjfpes of the allocations is equivalent to say that ao, - ■ ■ , dn, must have the same 
shape. 

Definition 5.14. (Concretization of labelled sets.) Let So, - ■ ■ Sn,S^ be 

labelled sets such that 

Vi e {0, ■ • • , n} : shape(6''') = SHAPE(5i). 



MAP(f) = < 



{ 
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Then we define MAp(>S''', {So, • ■ • , Sm}) as the set 

/sK{So,---,Sm},(MAp(^S\l),{So{l),--- ,Sn{l)}) 



I e doni(5'o) ly. 

Note that from the definiton of labelled set, if >So, • • ■ , Sn, have the same shape, 
then they have also the same domain (Definition 5.4); that is, the definition is well 
formed. 

Definition 5.15. (Concretization of memories.) Let m G Mem and m" e 

Mem" such that 

m = {text, heap, global, stackframes, stacktop}, 

to' = {text*, heap", global", stacktail, stackhcad, stacktop"}. 

// (to, to") S Compatible we define MAp(to,to") as the set 

I MAP (text" , {text}) , MAP (global" , {global}) , map (stacktop" , {stacktop}) | 

u| MAP^o", { a e heap | a.ppoint = a".ppoint }^ a" e heap" | 
MAP^/", { stackframes(i) | stackframes(i).callsite = /".callsite, 
i e {stackhead.size, • • • , stackframes.size — 1} }^ /" G stacktail | 



MAP^stacktail(i), {stackframes(i)}^ 



i e {0, • • • , stackhead.size — 1} 



Note that the requirement of compatibility between the concrete memory to 
and tlio abstraction to," ensures that the function MAP is well defined. Once com- 
pleted the definition of the function / £ A, the semantics of the abstraction can be 
completed following the idea described in Definition 5.2. Alternatively, using the 
approach informally presented in the introduction (Section 2.6), the concretization 
function can be expressed in terms of approximation between locations, thus relying 
on the definition of the concretization function for the elements of the underlying 
abstract domain. Let m G Mem and to" = (/, P) G Mem" and let / G A the 
location abstraction function of to", then we say that m G 7(to") when 

V/ G £ : /(to,0 is defined ==4> m[l\ G i^to" [/(to, 0]) ; 

that for a points-to domain can also be written as 

\/l G C : f{m,l) is defined /(to, post(to, /)) G post(to", /(to, Z)). 

5.9.1 Singular Locations. The definition of singular location introduced in Def- 
inition 5.3 is not apphcable in a practical implementation as it would require to 
explicitly check the existence of an m in the concretization of to" with certain 
properties. As a consequence we need a safe approximation of the set of singular 
locations of an abstract memory. From the above definitions it can be easily seen 
that every abstract location that represents the middle part of an array of size not 
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less that three is certainly non-singular. The same holds also for stack tail segment: 
each frame in this segment can represent more concrete frames; then, during the 
analysis wc assume that all the locations contained in the stack tail are non-singular. 
Analogously for heap allocations; it is impossible to tell for a given allocating state- 
ment if it can be executed at most one time; in other words, it is impossible to tell if 
there exist a m G 7(777,") such that a given abstract heap allocation abstracts more 
concrete heap allocations. As a consequence, we safely assume that all abstract 
heap allocations are non-singular. 

5.10 Abstract Operations 

Thus section presents some informal considerations about the remaining operations 
required in order to complete the description of the execution model. We have 
already described the problem of formalizing operations on the memory model 
in Section 2.2: some operations are necessary to formulate the concrete execution 
model Mem; these are then generalized to the concrete domain piC) and an approx- 
imation on A is provided. Consider for instance the assignment operation. Other 
operations are not required by the concrete execution model, but are useful for the 
analysis; these operations are directly formulated on the concrete domain p(C) and, 
as usual, an abstract counterpart is formulated on A. Consider for instance the 
filter, the merge and meet operations. A more rigorous description of some of these 
is presented in [BHPZ07]. 

5.10.1 Notation. Before proceeding we introduce some notation. Let A be a 
non empty sequence. We write A=[H \T]to mean with H the first element of A, 
also called the head element of A\ and with T the remaining part of A, also called 
the tail of the sequence A. We denote with '[]' the empty sequence. 

5.10.2 The Mark Operation.. This operation has the effect of closing the current 
block. In our memory model we have modeled the stack top segment a sequence of 
blocks 'Bs' not contained in any frame, followed by a sequence of allocations 'As' 
not contained in any block. Let m € Mem be such that 

m.stacktop = [As, Bs]. 
Then we have 

MARK(m) = mo e Mem 
such that 

mo-stacktop = [[], [As j Bs]], 

while the rest of the memory is left unchanged. In words, the allocations 'As' 
present in the stack top segment are moved in a block at the head of the sequence 
of blocks 'Bs'. The abstract mark operation is defined in the same way. 

5.10.3 The Link Operation. This operation has the effect of creating a new 
frame on the stack. Let m € Mem be such that 

m.stacktop = [[], [B \ Bs]] , 
m.stackframes = Fs. 
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Let 

link(to) = mo e Mem, 

then we have 

mo-stacktop = [[], [B]] , 
mo-stackframes = [Bs | Fs], 

and the rest of the memory is left unchanged. The block denoted above as B 
is intended to represent the arguments and the return value of the function cah 
that has triggered the link operation. To emulate the arguments passing from the 
callee to the called context, the allocations of the block B arc left in the stack 
top segment. The abstract operation is formulated similarly, the only difference 
is that the 'stackhead' segment is used instead of the stack frames segment. Let 
m*' e Mem' be such that 

m'.stacktop = [[], [B \ Bs]], 
m". stackhead = Fs, 

Let 

LiNK''(m*) = ttIq G Mem, 

then we have 

TOg.stacktop = [[], [B]\ , 
TOq. stackhead — [Bs | Fs]. 

5.10.4 The New Variable Operation. This operation is required to populate the 
stack. Ideahy this operation can be split in two parts: first, the creation of the new 
allocation; second the initialization of its locations. Since the initialization can be 
treated a sequence of assignments, here we consider only the creation of the new 
locations. Let m G Mem be such that 

TO.stacktop = [As, Bs]. 
Let t € Types be the type of the allocated object and let 

NEWs{m,t) = mo G Mem. 
We have 

mo-stacktop = [[A | As], Bs] , 
where (Definition 5.6) 

A = ALLOC(t). 

Again, the abstract operation is defined in the same way, except that the new 
allocation A is defined as (Definition 5.7) A = ALLOc''(f). 
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5.10.5 The Unlink Operation. This operation can be thought as the inverse of 
the Unk operation — if the link emulates the effects of a call statement then the 
unlink emulates the effects of a return statement. Let m € Mem be such that 

m.stackframes = [F \ Fs], 
TO. stacktop = [[], [B]] . 

The block B contains the arguments and the return value of the called function 
that are returned to the caller context. In particular we assume that the stack top 
contains only one block and that the stack frames segment contains at least one 
frame — in words, this requires that every return statement must be preceded by 
a call statement. Let 

UNLiNK(m) = mo G Mem. 

We have 

TOQ.stackframes = Fs, 

Too. stacktop = [[], [B | F]] , 

while the rest of the of the memory is left unchanged. Note that the topmost 
frame F of the stack frames segment of m has been moved in toq to the stack 
top segment and the block B has been appended to it. Basically, the abstract 
operation is defined in the same way; the only difference is that instead of using 
the 'stackframes' segment the 'stackhead' segment is used. 

5.10.6 The Unmark Operation. This operation can be thought as the inverse of 
the mark operation — if the mark operation creates a new block gathering all the 
ungrouped allocations of the stack top, then the unmark operation deletes these 
allocations and replaces them with the allocations contained in the topmost block. 

Let TO G Mem be such that 

TO. stacktop = [As, [B \ Bs]] . 

Let 

unmark(to) = Too G Mem. 

We have 

to. stacktop = [_B,Bs], 

while the rest of the memory is left unchanged. Note that the sequence of allocations 
'As' has been removed and in its place we now find the allocations of the block B. 
The abstract operation is defined in the same way. It is worth stressing that the 
implementation of this abstract operation probably requires an additional step to 
notify the remaining locations that the locations in 'As' no more exist; for instance, 
this is required for a pointer that was pointing to one of the deallocated locations 
(As). In this case, depending on the concrete execution model adopted, this pointer 
can be marked as undefined. 

In our model we use the following operations to set the degree of context-sensitivity 
of the analysis and to approximate recursive function calls. Both these opera- 
tions have no effects on the concrete domain, that is, for all to G Mem we have 
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Op(to) = m. In terms of the approximation this means that for all nJ G Mem we 



5.10.7 The Stack Tail Push Operation. This operation has the effect of moving 
the oldest frame of the stack head segment (from now the 'pushed frame') to the 
stack tail. Recah that the stack tail segment is a labehed set of frames where each 
frame is identified by a call site and that the call site uniquely identifies the shape 
of the frame. This means that for each call site the stack tail can contain only one 
frame. Thus, if it already contains a frame with the same call site of the pushed 
frame then the pushed frame will be merged into the corresponding stack tail frame. 
Otherwise, if no frames with the same call site are already present, the frame wih 
be simply added to the stack tail. Let to* G Mem" be such that 

TO*.stackhead = Fs :: [F], 

TO^.stacktail = {Fq, • • ■ , 

where F denotes the last element of the non-empty stack head segment; 'Fs' denotes 
the remaining part of the same sequence and n = TO*'.stacktail.size G N. Let 

tailpush''(to'') = to^ e Mem". 

then we have 

TOQ.stackhead = Fs, 



Note that the stack tail segment is a labelled set, thus the order indicated above, 
Fo, - ■ ■ ,Fn, among its frames is completely artificial and introduced for notational 
convenience — writing Fo.callsite = F.callsite we mean that there exists a frame in 
the stack tail with the same call site of F. 

5.10.8 The Stack Tail Pop Operation. This operation is the inverse of the stack 
tail push — it moves a frame from the stack tail back into the stack head segment. 
To do this we have to specify which frame to restore, that is the stack tail pop 
operation requires a call site. Let 

TAlLPOp" : Mem' x Callsitcs Mem" 

Given c e Callsites and to* € Mem* , if the stack tail segment of to* does not contain 
any frame labelled c then the operation results in the _L element. Otherwise let 

TO*.stackhead = Fs, 

TO*.stacktail = {Fq, ■■ ■ ,F„} 

be such that Fo-callsite = c. Then calhng 

tailpop*(to*) = to^ e Mem*, 



TOg.stackhead = Fs :: [F], 
while the rest of the memory, also the stack tail segment, remains unchanged. 



have that 7(m*) C 7(op(to*)). 




if Fo.callsite = F.callsite; 
otherwise. 



we have 
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5.11 Approximating the Stack 

The concept of stack tail is introduced precisely to handle recursion. In presence 
of recursive function calls, the number of frames on the concrete stack cannot be 
limited by any finite bound. Beyond these theoretical considerations, just from a 
practical perspective it is unfeasible to keep an arbitrary number of distinct abstract 
frames. The idea of our abstraction to address this problem is to represent 'precisely' 
the variables of the local environment, approximated by the stack top segment, and 
global variables, represented by the global segment. Also the topmost k frames of 
the concrete stack are abstracted 'precisely' by the stack head segment. However, 
we approximate more roughly in the stack tail segment, the content of the concrete 
stack below the first k frames. Frames in the stack tail are identified by their call 
site; this means that the concrete frames labelled by the same call site c that are 
below the fc-th topmost frame, are all approximated by the same abstract frame, 
which is contained in the stack tail and it is identified by c. 

5.12 Pointer Arithmetic 

This section presents a prototype for handling pointer arithmetic. Complex ap- 
proaches to this problem are already present in the literature; for example, string 
cleanness techniques associate an integer quantity to every possible target of a 
pointer, to represent the distance between the beginning of the pointed object and 
the pointed address. These integer quantities are then approximated by the analy- 
sis using a some numerical abstraction; with the availability of relational numerical 
domains, these methods can be precise but costly [Fra07]. The method that we 
present now is attribute independent and it is completely handled by the points-to 
domain; the presence of an external numeric domain is assumed only to query for 
the value of integer expressions during the evaluation of the pointer arithmetic. Let 
m" G Mem" and consider the expression p + i where 

— the expression p is of pointer type and its abstract evaluation results in a location 
that is part of an array. We assume to know the type of the elements of the array 
and the size of the array itself. 

— The expression i is of integer type and it represents the added offset. 

To represent the possible errors that can arise from the concrete evaluation of the 
expression p + i, we use the set 

RTSETmTs = {E-,E+}, 

where with 'i?^' wc denote the array underflow error and with 'i?+' we denote the 
array overflow error. To formalize the concrete evaluation of a pointer arithmetic 
expressions, let 

PTRARITH : Mem x Expr x Expr ^ CU RTSErrors 

be a partial function defined for every pair of expressions p,i & Expr where p is of 
pointer type and i is of integer type.^^ Let 

PTRARITH: p(Mem) x Expr x Expr m p(£ U RTSErrors) 



We assume that the two sets C and RTSErrors have disjoint representations. 
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be its extension to sets of concrete memories defined, for all M C Mem, as 

ptrarith(M,p, i) =^ [^{ PTRARiTH(m,p, i) \ m E M ] . 

A rigorous definition of PTRARlTll(m,p, i) would require a rigorous definition con- 
crete execution model [BHPZ07]; an informal presentation of the concrete semantics 
used here is later discussed in Section 5.12.2. To denote the approximation for the 
concrete operation ptrarith we introduce the function^^ 

ptrarith: Mem* x Expr x Expr ^ p(£* U RTSErrors). 

Generally, in an abstract memory description m", the evaluation of a pointer ex- 
pression results in a set of abstract locations. It is however convenient to define the 
abstract semantics of the PTRARITH function by working on one abstract location 
at a time. Thus, to ease the presentation we introduce the helper function 

PTRARITH: Mem* x Loc" x Expr p{C'^ U RTSErrors) 

where, given m" £ Mem", I £ Loc" and the integer expression i € Expr, with 
PTRARlTH(m'', Z, i) we represent the set of the possible abstract locations resulting 
from the addition of the value of i to the location I in the memory m" . Let again 
p,i € Expr; then we define 

PTRARITH(m'',p,i) =^ [J{ PTRARITH(m'',Z,i) | / € EVAL(m'*,p) }. 

To query the numerical domain about the value of the integer expression i we 

assume the existence of a function 

EVALINT : Mem'* x Expr p(Z) 

with the following semantics 

EVALlNT(m'', z) =' 1^ G Z 3m e 7(to") . m[EVAL(m,i)] = -zj- 

In words, the function EVALINT returns the set of the possible values that the integer 
expression i can assume in the concrete memories m approximated by m". 

The function PTRARITH is defined as follows. We first introduce some notation. 
Let S" G N \ {0} be the size of the array on which we are performing pointer 
arithmetic. 



Symbol 


Description 


Concrete range 


E- 


Array underfiow error. 


(-00,-1] 


H 


Array head location. 


[0,1) 


T 


Array tail location. 




O 


Array off-by-one location. 


[5, S+1) 


E+ 


Array overfiow error. 


[S + l,+oo) 



The abstract memory model described in Section 5.3 approximates array variables 
using three distinct abstract locations here denoted with 'iJ', 'T' and 'O'; we use the 
symbols and ''E~' to denote the possible exceptional outcome of the arithmetic 
operation due to the exceeding of the array bounds. Let /S G N \ {0} be the size of 



Also in this case we assume that and RTSErrors have disjoint representations. 
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the considered array; we distinguish four possible cases: S = 1, S = 2, S = 3, S > A. 
Each of these cases is described by one of the below tables. In each of this tables, the 
first column contains a set of intervals of Z that forms a partition of Z itself. The first 
row of these tables represents instead the three possibility for the abstract locations 
I supplied to the function ptrarith. Let D = D{S) be the table corresponding to 
the location /. We denote as 'D. rows' the number of rows of the table D. For each 
n G {1, . . . ji'.rows} we denote as '£).row(n)' the n-th row of the table D. Given 
a row i? of D we denote as 'iJ.interval' the interval of Z associated to R, which is 
located in the first column. With 'i?.loc(Z)' we denote the cell at the intersection 
of the row R and the column associated to the location I — the second column if I 
represents the head location H of the array, the third column if I represents the tail 
location T, or the fourth column if I represents the off-by-one location O. With 
this notation, the function PTRARITII can be defined as 

ptrarith(to'*, I, i) \^{ L{S, i) \ i e {1, ■ ■ ■ , D.rows} }; 
where 

R{S,n) =''L'(5).row(n); 

def fi?(S', n).loc(/), if i?(5',n). interval nEVALlNT(m*',z) 7^ 0; 
' ' 1 0, otherwise. 



5 = 1 


H 


T O 


5 = 2 


H 


T 


O 


(-00,-2] 


E- 


E- 


(-00,-3] 


E- 


E- 


E- 


-1 


E- 


H 


-2 


E- 


E- 


H 





H 


O 


-1 


E~ 


H 


T 


1 


O 


E+ 





H 


T 


O 


[2,oo) 


E+ 


E+ 


1 


T 


O 


E+ 






2 


O 


E+ 


E+ 








[3,oo) 


E+ 


E+ 


E+ 



5 = 3 H T O 5>4 H T O 



(-00, -4] 


E- 


E- 


E- 


(-0O, -5 - 1] 


E- 


E- 


E- 


-3 


E- 


E- 


H 


-5 


E- 


E- 


H 


-2 


E- 


E-,H 


T 


1-5 


E- 


E',H 


T 


-1 


E- 


H,T 


T 


[2-5,-2] 


E- 


E-,H,T 


T 





H 


T 


O 


-1 


E- 


H,T 


T 


1 


T 


T,0 


E+ 





H 


T 


O 


2 


T 


0,E+ 


E+ 


1 


T 


T,0 


E+ 


3 


O 


E+ 


E+ 


[2,5-2] 


T 


T, O, E+ 


E+ 


[4,oo) 


E+ 


E+ 


E+ 


5-1 


T 


0,E+ 


E+ 








5 


O 


E+ 


E+ 










[5+l,oo) 


E+ 


E+ 


E+ 



Since the C language provides various mechanism to create arrays whose size is 
computed at run-time, we ought to consider the case of handling pointer arithmetic 
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1 


const 


unsigned int H = . . . ; 


2 


int a 


[W] ; 


3 
4 


void 


foo(const T* first, const T* last) { 


5 


// PPO 


6 


while (true) { 


7 




// PPI 


8 




if (first == last) break; 


9 




// PP2 


10 




... = * f ir s t ; 


11 




// PP3 


12 




++f irst ; 


13 


} 




14 


// PP5 


15 


} 





Listing 34: an example of a simple loop that depends on the pointer arithmetic 
computation. 



on arrays of unknown size.^° To handle the case of arrays of unknown size we 
compute a merge of the above cases, obtaining the following table. 



S>0 


H 


T 


O 


(-00,-2] 


E" 


E-,H,T 


E~,H,T 


-1 


E- 


H,T 


H,T 





H 


T 


O 


1 


T,0 


T,0 


E+ 


[2,oo) 


T, O, E+ 


T, O, E+ 


E+ 



5.12.1 Examples. The following examples illustrate the described method ap- 
plied to Listing 34. For convenience of notation we represent the steps of the 
computation using a table with two columns: the first column shows the program 
point currently executed and the second column shows the abstract value of the 
variable 'first'; note indeed that the value of the pointer variable 'last' is never 
changed by the execution of the function 'foo'. Since the 'foo" function contains a 
loop, the abstract computation terminates when a fix-point is reached; to separate 
the different iterations of the loop analysis we use horizontal lines. In the last row of 
the table we will show the result of the merge of all the exit states of the loop. For 
simplicity of presentation we assume that the array 'a' declared at fine 2 contains 
at least four elements. 

Example 20. Consider the call 'foo (a, a + N)'. In the concrete domain the 
expression 'a + N' evaluates to the address one-past-the-end of the array 'a', that 
in the abstract domain corresponds to the off-by-one abstract location O. During 
all the execution of the 'foo' function we have EVAL(m'', last) — {O}. Instead, 
the expression 'a ' evaluates to the address of the begin of the array 'a ', that in the 



•^"Or of partially unknown size. For example, the analysis could be able to determine some 
approximation of the value used to specify the size the array during its allocation. 
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abstract domain corresponds to the head abstract location H. Thus, at the entry 
point of 'foo', the expression 'first' evaluates to H. These are the steps of the 
execution 



pp 


EVAL(m'', first) 





H 




1 


H 


(1st) 


5 


_L 




2 


H 




3 


H 




1 


T 


(2nd) 


5 


_L 




2 


T 




3 


T 




1 


T,0 


(3rd) Fixpoint 


5 







2 


T 




3 


T 




5 








Note that the filter on the guard condition of the loop 'first == last' at line 8, is 
able to split the points-to information 

{{first, a.T), {first, a.O)} 

into {{first, a.T)} for the else branch -that represents the continuation of the 
loop- and into {{first, a.O)} for the then branch, that represents the execution 
paths that exit from the loop. In this case the analysis finds the fixpoint of the loop 
without signalling any error due to the pointer arithmetic; that is, it is able to prove 
the absence of errors in the execution of the loop. 

Example 21. Consider the call 'foo (a, a)'. During all the execution we have 

EVAL(m'', last) = {H}. 

These are the steps of the execution 



PP 


EVAL{m^, first) 





H 


1 


H (1st) Fixpoint 


5 


H 


2 


_L (unreachable) 


5 


H 



In this case the analysis is able to prove that the execution exits immediately from 
the loop without modifying the value of 'first' and without any error. 

Example 22. Consider the call 'foo (a + N, a + N)'. This case is very similar 
to the previous one. During the execution we have EVAL(m.'', last) = {O}. These 
are the steps of the execution 
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pp 


EVAL{m^, first) 








1 


O (1st) Fixpoint 


5 


O 


2 


_L (unreachable) 


5 






Note that at the first iteration of the loop the filter is able to prove that 'first' and 
'last' are definitely aliases. Also in this case the analysis is able to prove that the 
execution exits immediately from the loop without modifying the value of 'first' 
and without any error. 

Example 23. Consider the call 'foo( a + N, a)'. During the execution we have 

EVAL(m'', last) = {H}. 

These are the steps of the execution 



PP 


EVAL{m'^, first) 





O 


1 


(1st) 


5 


_L 


2 


O 


3 


(+ Dereference Warning) 


1 


E+,± (2nd) Fixpoint 


5 


_L 



In this case the analysis is able to detect that in the first iteration of the loop at 
program point 3 an off-by-one location is dereferenced. Depending on the concrete 
execution model adopted, the analyzer may assume that the concrete execution ter- 
minates or not. In the last case the analysis is able to prove that during the next 
iteration of the loop, the pointer 'first' is incremented beyond the legal bounds of 
the array. 

Example 24. Consider the calls 'foo(a + 4, a + 6)', 'foo(a + 5, a + 5)' 
and 'foo(a + 6, a + 4)', which have the same abstraction. Indeed the expres- 
sions 'a + 4', 'o- + 5', 'a + 6' -and more generally the expressions 'a-\- i' with 
i € {1, • ■ • , N — 1]- all evaluate in the abstract memory to the tail location T of the 
array 'a'. These are the steps of the execution 
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pp 


EVAL(m», first) 


n 
U 


T 


1 


T (1st) 


5 


T 


2 


T 


3 


T 


1 


T,0 ★ (2nd) Fixpoint 


5 


T 


2 


T,0 


3 


T (+ Dereference warning) 


5 


T 



Note that a.T is not singular; thus, the filter at the guard of the loop cannot remove 
the arc {first, a.T) from the else branch. Then at program point 2 we still find 
{T,0}. The above table represents the case in which the execution model forbids 
to dereference pointers to the ojf-by-one location of an array. In this case, when 
the abstract execution reaches program point 3 in the last iteration of the loop the 
analyzer filters away the off-by-one locations from the possible targets of 'first' 
and raises a warning. In this case the analysis successfully detects the possibility of 
an error, indeed there exist at least one concrete execution in which the off-by-one 
location is dereferenced. Otherwise, if the analyzer accepts as valid the dereference 
of the off-by-one location at line 3 we would obtain 



3 T,0 

1 T,0 * (+E+) 

That is the analysis detects that the increment of 'first' at line 12 can produce an 
error due to the exceeding of the array bounds. 

Note that this model is symmetrical with respect to the direction of the increasing 
indices — the only difference is that the off-by-one location cannot be dereferenced, 
while the head location H can. 

5.12.2 Derivation of the Rules. This section provides the reader with a justifica- 
tion of the presented rules for the handling of pointer arithmetic. However, in this 
case the concepts are intuitive and the additional burden required to introduce a 
rigorous model to describe the rules does worth the effort. Therefore, we limit the 
presentation to an informal justification of some of the cases with the conviction 
that the remaining cases can be deduced similarly. Consider the case of an array 
whose elements are of scalar type t € Types which contains at least four elements, 
that is, S > 4. Under these assumptions, the concrete allocation block generated 
by t is 

ALLOc(i[S']) = [lo,h,--- ,ls-i,ls]; 

where the last location of the sequence Is represents the off-by-one location of 
the array. To this concrete allocation block corresponds the following abstract 
allocation block 

ALLOC* (tf^]) = [H,T,0]. 
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In this sense we can say that 

liH) = U 

^{T) = [h,--- ,ls-i], 

7(0) = [Is]. 
Let p e Expr, AgA and C G 7(A). 

— Consider the case EVAl(C, p) = {^o}- Since C is approximated by A and j{H) = 
[lo] we have that H G EVAL(A,p). In the concrete model if we move below the 
location Iq we cross the boundaries of the array triggering an undefined behaviour. 
In the abstract model we approximate this with E~ to mean the array underflow. 
If we move above the location /o of n G N positions, with n < S, we reach 
the concrete location In- In the abstract model, staring from the head abstract 
location H and adding n, with n G [1,5 — 1], we reach the tail location T; 
otherwise, for n = S' the off-by-one location O is reached. If we move above the 
location Iq oi n G N positions, with n > 5, we trespass the boundaries of the 
array producing an error, that we abstract with £'+. Summing up we have, for 
the concrete model 



Offset 



^0 + Offset 



00, 0) Error: array underflow. 

lo 



S - 1 Is-i 
S Is 

[5-1-1, -hoo) Error: array overflow. 



and its abstraction is 



Offset 



H + Offset 



(-00,0) 


s 

[5 + 1, +00) 



E- 

H 

T 

O 

E+ 



-In case we start from the off-by-one location Is, that is eval(C,p) = {Is}, hi the 
abstract model we have O G EVAL(A,p). This case is quite symmetrical to the 
case of starting on the head location. 



Offset 



Is + Offset 



(—00, —S) Error: array underflow. 
-S lo 
1-S h 





[i.+^O 



Is-i 
Is 

Error: array 



r>rfl(- 
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and its abstraction is 

Offset O + Offset 

(-00,-5) E- 

-S H 

[1 -S,0) T 

o 

[l,+oo) E+ 

— We now consider all the cases EVAl((7,J5) = with n S [1, 5*— 1] as these cases 
have the same abstraction. All the concrete locations ^i,--- are indeed 

abstracted by the same abstract location T . The difference with respect to the 
two previous cases is that when we perform pointer arithmetic on the tail of an 
array we do not know on which concrete location we are working: there is indeed 
a set of possible locations. This means for instance that if we move from the l\ 
by an offset of 1 we reach I2, which is still in the tail; but starting from Is-i we 
obtain Is, which is in the off- by-one location O. From this reasoning it can be 
easily derived the result presented in the following tables. 



Offset h + Offset . . . Is-i + Offset 

(—00, —S] underflow underflow underflow 

1 — S underflow ... Iq 

2 — S underflow ... h 

—2 underflow . . . ls-3 

-1 lo ... ls-2 

h ... Is-i 

1 I2 ... Is 

2 I3 ... overflow 

S — 2 Is-i ■ ■ ■ overflow 

5—1 Is overflow o^'orflow 

[S. oc) overflow overflow OA'erflow 



and its abstraction is 



OflFset 




+ Offset 


(— oc . - 


-5] 


E- 


1-S 




E-,H 


[2-S, 


-2] 


E-,H,T 


-1 




H,T 







T 


1 




T,0 


[2,5- 


2] 


T, 0, E+ 


5-1 




0,E+ 


[5, 00) 




E+ 



Composing these three cases we obtain the complete table for the case 5 > 4 for 
the abstract pointer arithmetic rules. 
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5.13 Relational Operators 

Just not cited above for simplicity of notation, we describe here one of the possible 
extensions to the filter operation that in some sense is bound to the handhng 
of pointer arithmetic. In particular now we want to consider the use of relational 
operators — the '<=', '<' and their symmetric — and their interaction with the points- 
to problem. We report here the statement of the C standard about the use of 
relational operators between pointers [Int99, 6.5.8.5]: 

If the objects pointed to are members of the same aggregate object, point- 
ers to structure members declared later compare greater than pointers to 
members declared earlier in the structure, and pointers to array elements 
with larger subscript values compare greater than pointers to elements 
of the same array with lower subscript values. [. . . ] If the expression P 
points to an element of an array object and the expression Q points to 
the last element of the same array object, the pointer expression Q+1 
compares greater than P. In all other cases, the behavior is undefined. 

Recalling the simplified model introduced in Section 3, we need to extend the set of 
the possible operators {cq, ncq} to comprehend the additional operators of interest. 
Once augmented the set Cond with the new conditions we have to define a proper 
concrete semantics for the new elements. Formally, this requires the definition of 
a partial order on the set of location addresses. This partial order should satisfy 
the requirements of the C Standard reported above. Using the terminology of 
the extended memory model presented in Section 5.3 we can say that this order is 
required to be defined only between the locations that belong to the same allocation 
block. In this model we have defined the concept of allocation block as a sequence 
of locations and the order of the locations within the allocation in such a way to 
reflect the actual memory layout. Under these assumptions it is possible to dcflne 
the required partial order as the order specified by the allocations; this way we are 
able to correctly describe the semantics of the C Standard not only for pointers to 
arrays but also for pointers to structure members. 

Now, using the notation introduced in Section 3, assume to have already defined 
the needed strict partial order, denoted as '<', on the set of locations < C C x C 
Consider the following extension of the concrete execution model. From Defini- 
tion 3.8 we extend the set of conditions 'Cond' by adding to the set of the possible 
operators the element 'It', as to represent the 'less-than' operator of the C language. 

def 

Cond = {eq, neq, It} x Expr x Expr. 

We also need to extend Definition 3.9, to comprehend the newly added elements of 
'Cond'. Let TrueCond C C x Cond be extended, for all C G C and e, / G Expr, as 

(C, (It, e, /)) G TrueCond 44 eval(C, e) < eval(C, /). 

Now we present a possible extension of the abstract filter operation (Definition 3.18) 
for handhng the relational operator 'It'.^^ 



For simplicity of exposition we treat explicitly only the operator 'It' and we omit other relational 
operators whose formalization can be deduced from the formalization of 'It' by symmetry and by 
composition with the equality operator. 
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Definition 5.16. (Filter on the less-than operator.) Let 

(f>: Ax Cond A 
be defined as follows. Let A G A and e, / G Expr. Let 
EVAL(A, e) \3m e EVAL(A, f) . I < m}; 

F'^= {m G eval(A, /) I 31 g eval(A, e) .1 < m}; 
then 

^{A, (It, e, /)) 1^' cPiA, E, e) n c^{A, F, /). 

But note that we have to consider separately the possible exceptional outcomes 
due to the comparison between incompatible locations — as reported above, the 
C Standard states that the order '<' is defined only between addresses of the same 
object, or using our nomenclature, between locations of the same allocation block; 
in all other cases the behaviour is undefined. Listings 35 and 36 are two examples 
of the application of the filter on the relational operator 'less-than'. 

5.13.1 Justification of the Definition. Now we want to provide an intuitive de- 
scription of the motivations behind the presented definition of the filter for the 'less 
than' operator. Let again (It, e, /) € Cond, A €: A and let 

C e 7(^) n MODELSET((lt, e, /)) = 0(7(^), (It, e, /)). 

We know, from our definition of the concrete semantics of the operator 'It' that 

C e MODELSET((lt, 6, /)) =^ EVAL(C, c) < EVAL(C, /). 

Basically, since A is an abstraction of C then we have that the value of / in C is 
approximated by the value of / in A. The same holds for the expression e. This 
means that the sets E and F contain the value of e and / in C, respectively, then C 
is also approximated by 4>{A, (It, e, /)). For completeness we also report a formal 
proof of the correctness of the above definition. First we prove an analogue of 
Lemma 3.51 for the 'less than' operator; then we extend the proof of Theorem 3.27 
to the 'It' operator. 

Lemma 5.17. (Less-than target.) Let A £ A and e, / € Expr; let 
E'^= {l € eval(A, e) I 3m e eval(A, f) . I <m}; 
F { m G eval(A, /) I 3/ g eval(A, e) . I <m}; 
then, for all C G (j>(^{A),cj , holds that 
eval(C, e) C £: a eval(C, /) C F. 

Proof. Let c = (It, e, /) G Cond, Ae A and let C G 7(A)nMODELSET(c). Let E 
and F be defined as in the statement of this lemma. Recall that from the concrete 
semantics of the operator 'It' described above we have that C G MODELSEt(c) 
implies that 

eval(C, e) < eval((7, /). 

Then we have 



120 • S. Soffia 



TS 


eval(C, e) C £; a eval(C, f)CF 




HO 


C e 7(^) 




HI 


EVAL(C,e) = {/()} 




H2 


eval(C,/) = {wo} 




H3 


^0 < mo 




H4 


Lemma 3.31, monotonicity of eval. 




H5 


Definition 3.2, the concretization function. 




DO 


C C yl 


(HO, Ho) 


Dl 


eval(C, e) C eval(A, e) 


(DO, H4) 


D2 


eval{CJ) c eval(A/) 


(DO, H4) 


D3 


3/ e eval (A, e) . I <mo 


(H3, HI, Dl) 


D4 


3m G eval(A, f) . lo <m 


(H3, H2, D2) 


D5 


EVAL(C,e) C E 


(D4, HI) 


D6 


eval(C,/) C F 


(D3, H2) 




eval(C, e)CEA eval(C, f)CF 


(D5, D6) 



□ 

Proof. (Correctness of the filter on the less-than.) Let A & A, let c = 
(lt,e, /) G Cond and let C G C Let E and F be defined as in Definition 5.16. 



TS 


Cgj{(P{A,c)) 




HO 


C^c 




HI 


C G jiA) 




H2 


Lemma 3.50, correctness of filter 2. 




H3 


Definition 5.16, filter on the loss-than. 




H4 


Definition 3.2, concretization function. 




H5 


Lemma 5.17, the less-than target. 




DO 


EVAl(C, e) CE 


(H5, HI, HO) 


Dl 


eval(C,/) C F 


(H5, HI, HO) 


D2 


CGjU{A,E,e)) 


(DO, HI, H2) 


D3 


Cej(<l,{A,FJ)) 


(Dl, HI, H2) 


D4 


C C (t){A,E,e) 


(D2, H4) 


D5 


CCcl>(A,F,f) 


(D3, H4) 


D6 


(P{A,c) = cj){A,E,e)neb(A,F,f) 


(H3) 


D7 


CC<^(A,£;,e)n0(A,F,/) 


(D5, D4) 


D8 


C C ^{A, c) 


(D7, D6) 




CGj{cPiA,c)) 


(D8, H4) 



□ 

Note that the structure of the proof for the correctness of the filter on the less- 
that operator is very similar to the structure of the proof for the equality case: 
actually the only difference is the definition of the target sets E and F. 

5.14 Special Locations 

One of the simplifications introduced in the model of Section 3 is that all locations 
are treated in the same way. In particular, in the definition of the abstract eval- 
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1 


int a[10], b[20], c [30] , d [40] , +p, +q; 


2 
3 


-i f C 1 -f 




4 


11 K. . . . J \ 




5 


■i'F^' "1 ■( n = pi • r! = j=> + 




6 


C X o d T-r — ^ i 4 — 




7 


f-aIco -Tt^^V** = + 
jcXotr — ' T. — U ^ 


10- > 


8 
9 


II eval(*p) = {a.H,b.H,c.H} 




10 






11 

12 


if (p < q) { 




13 


/ / 111 VA-IjI ^ y ) — iLt.JJ , U.ll f 




14 


1 1 FVAT l'* — in T h T\ 




15 


\ falop -T TlnrpapViaVilp 

J ^z; _L o v> L / XX X c oj XX f x i v ' ^ / 


> 


16 


6 X S S "C 




17 


if (...) { 




18 


11 \ . . . ) ip-a+o, q- 


a + in- "I- 

a + J- u J J 


19 


clofcJ T-P — (J 




20 


} else { p = b + 10; q 


= b + 20; } 


21 

22 






23 


/ / eval(* q) = {a.O, 6.0, c.//} 




24 
25 


if (p < q) { 




26 


// eval(*p) = {a.T,fo.T} 




27 


// eval(*(?) = {a.0,6.0} 




28 


} else { /* Unreachable */ 


} 


29 


} 




30 






31 


// eval(*p) = {a.H,a.T,b.H,b.T} 




32 


1 1 eval(* q) = {a.T, a.O, b.T, b.O} 




33 
34 


if (p < q) { /* The same */} 




35 


else { /* The same */} 





Listing 35: an example of analysis of a program involving pointer arithmetic and 
filtering on relational operators. 



nation function (Definition 3.6) and of the assignment operation (Definition 3.10) 
there are no limitations on the locations that can be dereferenced or modified. How- 
ever, a realistic memory model should provide a way to limit, on some locations, 
the possible operations. For instance, consider a null pointer. The C Standard 
specifies that dereferencing a null pointer produces an undefined behaviour. From 
[Int99, 6.5.3.2.4] 

The unary * operator denotes indirection. [. . . ] If an invalid value has 
been assigned to the pointer, the behavior of the unary * operator is 
undefined. [. . . ] Among the invalid values for dereferencing a pointer 
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1 struct T { int a, b, c; } tO , tl, t2 , t3 ; 

2 int *p, *q; 

3 

4 if (...) { 

if (...) { 

6 if (...) { 

7 else { 

8 } else { 

9 

10 / / EVAh{*p) = {t0.a,tl.b,t2.a} 

11 II eval(* q) = {tO.6, il.6, t3.6} 

12 

13 if (p < q) { 

14 // eval(*p) = 

15 // eval(* q) - 
18 } else { 

17 // eval(*p) = 

18 } 

19 } else { 

20 if (...) { 
if (...) { 

22 else { 

23 } else { 

24 

25 / / eval(* p) = {tO.6, t l.a, tl.h) 

26 // eval(*(7) = {tO.c, t3.c} 

27 

28 if (p < q) { 

29 // eval(*p) = {tO.6, fl.a} 

30 // EVAL(*g) = {tO.c, tl.fe} 

31 } else { /* Unreachable */ } 

32 } 

33 

34 // eval(*p) = {tO.a, tO.6, tl.a, tl.&} 

35 // EVAL(*q) = {t0.6,i0.c,tl.6} 

36 

37 if (p < q) { 

38 // eval(*p) = {tO.o, 40.6, fl.a} 

39 // eval(* q) = {40.6, iO.c, tl.6} 

40 } else { 

41 // eval(*p) = EVAL(*g) = {tO.6, tl.6} 

42 } 

Listing 36: an example of analysis of a program involving pointer arithmetic and 
filtering on relational operators. 



p = &tO . a ; q = &tO . b ; } 
p = &t2 . a ; q = ftt3 . b ; } 
p = &tl.b; q = fttl.b; } 



{tO.a} 
{tO.6} 

eval(* q) = {tl.6} 



p = 


= &to 


b; 


q = 


fttO 


c ; 


} 


p = 


= &t2 


b; 


q = 


&t3 


c ; 


} 


p = 


= &tl 


a ; 


q = 


&ti 


c ; 


} 
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by the unary * operator are a null pointer, an address inappropriately 
aligned for the type of object pointed to, and the address of an object 
after the end of its lifetime. 

In other languages, like Java, dereferencing a null reference throws an exception. 
Besides of the different responses that each language exposes, it is quite common 
that a language has its own set of configurations that are considered exceptional and 
treated in an ad-hoc way. Consider for instance the case of uninitialized variables; 
it would be possible to formalize a concrete semantics where uninitialized variables, 
or pointers pointing to a deallocated memory area, cannot be evaluated and then 
not copied. Though this kind of conformance is uncommon in "cvcry-day" programs, 
there exist application areas that require these restrictions [Mot04, Rule 9.1] [Loc05]. 
Note that the general idea is to capture some classes of exceptional behaviours; 
though the specific definition of what is exceptional can vary, also inside the same 
language. This section presents a possible extension of the model presented in 
Section 3 that can be used to represent the described concrete semantics. We 
introduce two sets of locations. 

— Let NonEval C £ be the set of non-evaluable locations. Informally, we say that 
trying to evaluate a non-evaluable location results in an error. 

— Let NonDeref C £ be the set of non- dereference- able locations. Informally, trying 
to apply the dereference operator to a location of this set will result in an error. 

To represent the possible run-time errors we use the set 
RTSErrors =^ {DerefError, EvalError}. 

The concrete behaviour can be described by defining an extended version of the 
evaluation function (Definition 3.6). Let^^ 

EVALe : C X Expr CU RTSErrors 

be the total function defined, for all C G C, I € jC and e S Expr, as 



Note that we have formahzed the new evaluation function by tagging the exceptional 
paths with the elements of the set 'RTSErrors'. An implementation of the execution 
model here proposed will handle these exceptional cases by signalling an error and 
terminating the execution, by raising an exception and modifying the execution 
mode or whatever else is considered appropriate. This operation can be generalized 



Here we assume that the two sets C and RTSErrors have disjoint representations. 




T = EVALe (C,e); 



\ aei 

EVALe((7, *e) = < 



'evaLo(C, e), 
DerefError, 
EvalError, 
,post(C,T), 



if T £ RTSErrors; 

if T e NonDeref; 

if post(C, T) e NonEval; 



otherwise. 
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to sets as follows. For every element in the result of the concrete evaluation, we 
want to track the corresponding concrete memory description. Also, we want to 
explicitly separate the exceptional and the normal component. Let 

EVALe: p{C) X Expr —^p{CxC) X p (RTSErrors x C) 

be a total function defined, for all C e C and e G Expr, as 



The abstract counterpart of the operation can thus be defined as 

eval| : A X Expr — » x ^ x p(RTSErrors) x A. 

Note that we are simplifying a little — indeed we assume to approximate elements 
of p{C X C) with elements of the product p(£) x A and p(RTSErrors x C) with 
elements of p(RTSErrors) x A; this is not completely general, however is sufficient 
for our goals. Given A G A and e G Expr we write 



where L C C, B,C G A and E C RTSErrors to mean that the abstract evaluation 
of the expression e results in the set of abstract locations L and the set of errors 
E; B is an approximation of the abstract memory that generates L and C is an 
approximation of the abstract memory that generates E. The requirements for the 
soundness of the of the abstract operation are the following. Let, for all A G ^ and 
e G Expr, 



then, to be sound, the abstract operation must satisfy the following requirements 



EVAL, 




EVAL 



i{A,e) = {L,B,E,C); 



EVAL»(Ae) = {L,B,E,C); 
EVALe (7(^4), e) = {Ro,Ri); 



{l\ {l,b) € Ro} C L; 
{b \ (/,6)ei?o} C7(B); 
{ e I (e, c) G i?i } C 
{c| (e,c)Gi?i} C7(C). 



Let A€ A and I G C; then, for the base case, let 




def j ( , ^ , {EvalError} ,A), if I e NonEval; 



j4, 0, _LV otherwise. 
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For the inductive case, let e G Expr and 

EVALj(A,e) = {Lo,Ao,Eo,Bo), 
Lo,x — LqCi NonDeref; 
Lo.jv = Lo\ NonDeref; 
Ao,N = <^(^o,e, Lo.at); 

Ao,x = <^(-4o,e,io,x); 

Li = POST(Ao,JV,io,Af); 

Li^x = ii n NonEval; 
Li^N — Li \ NonEval; 

= ^{Ao,N, * e, -£/i,x); 
^i,Ar = (f>{Ao,N, * e, ii,jv); 



EVAL«(A*e) = {Li,N,Ai,N,E,BoUAo,xUAi,x). 
In words, to evaluate * e we 

(1) evaluate e, 

(2) filter away the non-dcrcfcrcncc-able locations, 

(3) perform the actual dereference, 

(4) filter away the non-evaluable locations. 

We can have an error if the evaluation of e produces an error (Bq), or if we 
obtain a non-dereference-able location (^o.x) or if in the last step we obtain a 
non-evaluable location {Ax,i)- We have a location if all this steps are error free 
(^i,Ar E Ao^N E ^0 E A). Note that in the computation of Aq^n = (^(Aq, e, io,Ar) 
the filter cannot always remoA'e all the non-dereference-able locations from the re- 
sult of EVAL(Ao)e). Note however that we compute the result of the dereference 
operator, Li = POST(^o,iv,, io.Ar), on the set io,iv that by definition does not 
contain non-dereference-able locations. 

Also the formulation of the assignment operator can have its own class of special 
locations. For instance it is possible to define a set of non- modifiable (read-only) 
locations. Finding a read-only location in the result of the evaluation of the rhs, 
the analysis reacts by removing that location and signalling an error. For example, 
in our analyzer we have introduced two special locations. 



and 





{DerefError}, if Lo,x ^ 0; 
0, otherwise; 

{EvalError}, if Li,x 7^ 0; 



otherwise. 



Finally, 
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1 int *q, a; 

2 

3 int ** f () { 

4 if (...) return &q; 

5 else return 0; 

6 } 

7 

8 ... 

9 q = &a ; 

10 // EVAL(*g) — {a} 

11 int **p = f () ; 

12 // eval(*p) — {q, null} 

13 int **p2 = p ; 

14 / / Null can be evaluated, thus copied. 
16 // EVAL(*p2) — {q, null} 

16 if (...) { 

17 . . . = *p ; 

18 // Null cannot be dereferenced. 

19 / / eval(* * p) = {a} 

20 } else { 

21 *p = . . . ; 

22 / / Null cannot be written. 

23 // EVAl(*p) = {q} 

24 } 

Listing 37: an example of dereferentiation of a null pointer. 



— The null location that represents the concrete 'NULL' address described by the 
C Standard. This location can be evaluated but it cannot be dereferenced nor 
modified, i.e, null S NonDeref, null ^ NonEval and it is read-only.^^ 

— The undefined location to be used as a target for all undefined pointers and 
for pointers pointing to deallocated memory. We have modelled the undefined 
location as a non-evaluable location. 

Example 25. Consider the code in Listing 37. From the analysis point of view, 
the function 'f possibly returns null pointers, i.e., at line 12 we have EVAL(*p) — 
{q,null}. The evaluation of the expression *p at line 21 and the evaluation of the 
expression **p at line 17 produces the following sequence of steps^'^ 



Limiting our view to ttie points-to analysis, non-dereference-able locations may be seen as loca- 
tions that cannot be read. On the other side, the read-only locations proposed for the assignment 
operation cannot be written. In this sense the value of the null location can not be read or written: 
the null location can only be used as target for pointers. 

■^■^Recall that the syntax of the simplified language formalized in Section 3 is slightly difl^erent 
from the syntax of the C language. Indeed we do not distinguish between expressions and lvalues, 
then for example, the C-expression 'p' occurring as the rhs of an assignment corresponds to *p in 
our language, the C-expression '*p' as the rhs of an assignment corresponds to **p, while '*p' as 
the Ihs of an assignment remains the same. 
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0-0- 



The points-to abstraction before the evaluation 
of the expression '*p' at line 17; 



0- 



The exceptional component resulting from filter- 
ing the points-to abstraction shown above. 



0-0-0 



The normal component resulting from filtering 
the points-to abstraction shown above. The nor- 
mal execution will continue on this refined infor- 
mation. 



Fig. 29: an example of analysis involving the special location null. 



i 


eval(*p, «) 


eval(=i= *p, i) 


2 





{P} 


1 


{P} 


{null, q\ 





{null, q\ 


{a, DerefError} 



In this case, at line 17, the analysis warns about the possibility of a dereference of 
a null pointer and it continues the abstract execution assuming that 'p ' is not null. 
Instead, at line 21, the evaluation of the expression '*p ' as the Ihs of an assignment 
does not raise any error and returns the set {null, q}. However at this point the 
assignment operation detects that the program is trying to modify the null location 
and it triggers an error since we have modeled the null location as read-only. See 
Figure 29 for a graphical representation of this example. 

Example 26. Consider Listing 38. At line 5 the points-to information is 
eval(*pp) = {p, undef} and the abstract evaluation of the expression '*pp' produces 
the following sequence of steps 



i 




2 


{pp} 


1 


{p, EvalError} 





{a} 



In the step i — I of the evaluation, the algorithm detects the presence of the non- 
evaluable location 'undef and it proceeds by removing it from the result of the 



■^^ Again, using the formalization of the assignment presented in Section 3 the C-expression 'pp' 
occurring as the rhs of an assignment corresponds to '*pp' in our formalization. 
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1 int *p, *q, **pp , a; 

2 

3 p = &a ; 

4 if (...) pp = &p ; 

6 // eval(*pp) = {p, undef} 

6 if (...) { 

7 . . . = pp ; 

8 / / The undefined location cannot be evaluated. 

9 // eval(*pp) = {p} 

10 } else { 

11 pp = &q; 

12 // Assign pp without evaluating its value. 

13 // eval(* pp) = {(?} 

14 } 

Listing 38: an example of the evaluation of an undefined pointer due to an 
uninitialized variable. 



evaluation and by filtering the memory state against the condition (neq, * pp, undef) . 
As result, the analysis is able to infer that after the execution of line 1 holds that 
EVAl(*pp) = {p}. Figure 30 shows a graphical representation of this situation. 
Instead at line 11, the variable 'pp' is reassigned without evaluating the undefined 
location, then without producing any error. 

Example 27. Consider the example in Listing 39. At line 18 the points-to in- 
formation is 

EVAl(*pp) {p,q}, 
eval(*p) = {a, undef}, 
EVAL(*q) = {a}. 

At this point the abstract evaluation of the expression '*pp ' produces the following 
sequence of steps 



i 


EVALe(*p,i) 


2 


{pp} 


1 


?} 





{a, EvalError} 



In the last step of the evaluation the algorithm detects the presence of the non- 
evaluable location 'undef ' and it proceeds by removing this location from the result 
of the evaluation. However, in this case the filter is unable to divide the exceptional 
from the normal component, as illustrated in Figure 31. 

The idea of filtering away the exceptional component is formalized in [CDNB08]. 
Removing from the abstract execution state those exceptional configurations al- 
ready signalled prevents that the same error is propagated by the analysis from the 



Again, using our formalization of tiie assignment operation tiie C-expression '*pp' corresponds 
to **p. 
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The points-to abstraction before the evaluation 
of the expression 'pp' at line 7. 




0-Q 




The exceptional component resulting from the fil- 
tering of the points-to abstraction shown above. 



The normal component resulting from the fil- 
tering of the points-to abstraction shown above. 
The execution will continue on this refined infor- 
mation. 



Fig. 30: an example of analysis involving the special location undefined. 



first point to all the subsequent program points with the result of soiling the results 
of the analysis. Note that also other semantics are possible. For instance, it would 
be possible to model the undefined location as a non-dereference-able location in- 
stead as of a non-evaluable location. Under this assumptions uninitialized pointers 
and pointers pointing to deallocated memory can be evaluated and thus copied, 
however it is still treated as an error their dereference. In the above formalization 
we explicitly keep track of an approximation of the exceptional execution paths; 
however, in many situations this is too expensive and useless. In these cases the 
implementation can simply skip the collection of the exceptional states and gather 
only the signalled memory errors. 

5.15 Logical Operators 

The model described in Section 3 presents a very simplified definition of boolean 
condition; for example, it does not consider logical operators: and (&&), or (II) 
and the not (!). The first step necessary in order to handle these operators, is to 
extend the set of conditions. 

Definition 5.18. (Extended conditions.) Let ExtCond' be the set defined 
as the language generated by the grammar 

e ::= c | (not cq) | (eo or ei) | (eo and ei) 
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1 int **pp , *p , *q, a; 

2 

3 q = &a ; 

4 // EVAL(*g) = {a} 

5 

6 if (...) pp = &p ; 

7 else pp = &q; 

8 // EVAl(*Pp) = {q,p} 
s 

10 if (...) { 

11 int X ; 

12 p = &x ; 

13 ... 

14 } else { 

15 p = &a; 

16 } 

17 // eval(*p) — {a, undef} 

18 

19 ... = * pp ; 

20 / / Undef cannot be evaluated. 

21 // However, the filter cannot improve the precision. 

Listing 39: an example of the evaluation of an undefined pointer, this time due to 
a memory deallocation. 



where c £ Cond is an atomic condition and eo,ei G ExtCond are two extended 
conditions. 

The next step is to define the value of the new conditions. 

Definition 5.19. (Concrete semantics of the extended conditions.) Let 

C <E C and cq, ci e ExtCond; then 

C h (not Co) 44 C ^ Co; 

C 1= (co and ci) 4^ C ^ co A C ^ ci; 

Ch(co or ci)44chcoVChci. 

The definition of concrete filter do not need to be updated as it is expressed 
in terms of the value of the conditions. Finally, we update the definition of the 
abstract filter as to handle the new conditions. 

Definition 5.20. (Extended filter.) Let 

(j): Ax ExtCond Ay^ A] 

(j): Ax Ax ExtCond Ax A] 
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The points-to abstraction m' before the evalua- 
tion of the expression '*pp' at line 19; 



A concrete memory description model mo of the 
condition 

c = (neq, * * pp, undef) 

approximated by m'. 



^^^(T^ -> [undef ] 

.0-0 

(V]-> [undef] 

Another concrete memory description model mi 
of the condition c approximated by m" . Note 
however that 

o({mo,mi}) = m", 

Q( ~?\ that is the filter cannot remove any arc. 

lundefj ^ 

Fig. .31: an representation of the situation of Listing 39. 

he defined, for all A,B<eA and e, / e Expr, as 

<I>{A,B, (eq, e, /)) =' (<^(A, (cq, e, /)) (neq, e, /)) ) ; 

B, (neq, e, /)) (^(A, (neq, e, f)),^{B, (eq, e, /)) ); 

for all Co, ci G ExtCond, as 

(t){A,B, not Co) =^ 0(5, A, Co); 

dcf / 

(/)(A, B, Co or ci) = 0( A, _B. not ((not co) and (not ci)) 

(/)(A, B, Co and ci) (Ao H Ai,Bo U (Ao n Bi)); 
w/iere 

0(A,B,co) = (Ao,So); 

Finally, for all A £ A and c G ExtCond, we define 
cl)iA,c) = (b{A,A,c). 
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In this formalization of the filter operation, (p{A, c) returns a pair of abstract 
memories: the first component is an approximation of the states of A in which the 
condition c is true; the second is an approximation of the states of A in which the 
condition c is false. In this definition we have mentioned only the equality and 
inequality operator; however, it can be easily extended to comprehend relational 
operators (Section 5.13). 

Note that as shown by Section 4, the formulation of the filter (Definition 3.18) 
is not optimal and Example 15 shows that iterating the application of the filter it 
is possible to improve the precision. In Figure 32 we show that also having a filter 
that is optimal on the atomic conditions, iterating the appHcation of the filter can 
improve the precision. Note indeed that on the atomic conditions '****p4 == &a' 
and '***q3 == &b', the filter operates optimally (Definition 3.18). 

6. CONCLUSIONS AND FUTURE DEVELOPMENTS 

Alias analysis is an important step in the process of static analysis of programs. 
Compiler oriented applications are the most common clients of alias information. 
However, compilers stress the focus on fast analyses, whereas verifier oriented appli- 
cations require precise but slower techniques. The present work, trying to address 
verifier needs, discusses one of the most common method used to model the alias- 
ing problem: the points-to representation. Known results are presented within a 
formal model; a novel operation of filter is described and finally a formal proof of 
correctness of the presented method is reported. 

A working prototype of the method has been implemented as part of the ECLAIR 
system, which targets the analysis of mainstream languages by building upon 
CLAIR, the 'Combined Language and Abstract Interpretation Resource', which 
was initially developed and used in a teaching context (see http://www.cs.unipr. 
it/ clair/). 

However, many tasks have to be completed. Some of the features of the C 
language are still missing. One of the questions not answered is how it is possible 
to exploit the knowledge of the architecture/compiler target of the analysis process. 
For instance, the precise handling of unions and casts requires the knowledge of 
the relative size of basic types, the alignment issues and all the details that relate 
to the memory layout. 

The memory model described in Section 5.3 and implemented makes strong hy- 
potheses about the correctness of the type information. For example, the described 
abstract memory does not allow to precisely track pointers of type char* resulting 
from casts of pointer to objects of other types. Though the literature contains some 
proposals of how to avoid the necessity of relying on type informations [WL95] and 
how to analyze union and casts [Min06] , it is unclear whether these can be applied 
to our situation. On the other hand, the memory model does not require any spe- 
cial information about the type of variables. For instance, our analysis is able 'out 
of the box' to track pointer casted and assigned to integer. Architecture-specific 
information is also required in order to resolve the many implementation-de&ned be- 
haviours present in the C Standard. When the behaviour of the analyzed programs 
depends on these rules of the language, the analyzer, if not provided with additional 
information, can only warn and proceed with a conservative approximation of the 
execution that very often in few steps degenerates to the top approximation. 
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A representation of the initial points-to informa- 
tion. 



Filtering the points-to information against the 
expression '**q3', and the target set {b}. The 
arc (ql,a) is removed. 



Filtering the points-to information against the 
expression '***q4', and the target set {a}. The 
arc (p2, ql) is removed. 



Filtering the points-to information against the 
expression '**q3', and the target set {b}. The 
arc (q3,p2) is removed. 



Filtering the points-to information against the 
expression '***q4\ and the target set {a}. The 
arc (p4, qS) is removed. 



Fig. 32: an example of that shows that iterating the filter on more conditions can improve the 
precision of the approximation. 
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Consider for instance off-by-one locations. Currently, the memory model reserves 
an explicit abstract location address to represent ofT-by-one locations only at the 
end of arrays; this means that scalar variables do not have a corresponding off- 
by-one location. Hence, the current implementation forbids pointer arithmetics on 
the address of a scalar object, also when the increment is equal to 1, though the 
C Standard allows it [Int99, 6.5.6.7]. Moreover, in the presented formulation, the 
handling of pointer arithmetic on arrays assumes that the off-by-one address never 
overlaps with another valid location, though this is allowed by the standard [Int99, 
5.6.9.6]. 

To increase the precision of the provided alias analysis it would be possible to 
couple the points-to analysis with a shape analysis that would produce a more 
precise approximation of recursive data structures [Deu94]. 

For the implementation it will be necessary to realize a complete experimental 
evaluation of the proposed technique in order to produce quantitative data for the 
comparison with other approaches. 
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